|
76 | 76 | <div data-md-component="skip"> |
77 | 77 |
|
78 | 78 |
|
79 | | - <a href="#run-custom-langchain-openai-model" class="md-skip"> |
| 79 | + <a href="#run-custom-model" class="md-skip"> |
80 | 80 | Skip to content |
81 | 81 | </a> |
82 | 82 |
|
|
435 | 435 | <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix> |
436 | 436 |
|
437 | 437 | <li class="md-nav__item"> |
438 | | - <a href="#run-custom-langchain-openai-model" class="md-nav__link"> |
| 438 | + <a href="#run-custom-model" class="md-nav__link"> |
439 | 439 | <span class="md-ellipsis"> |
440 | | - Run custom langchain OpenAI model |
| 440 | + Run custom model |
441 | 441 | </span> |
442 | 442 | </a> |
443 | 443 |
|
|
732 | 732 | <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix> |
733 | 733 |
|
734 | 734 | <li class="md-nav__item"> |
735 | | - <a href="#run-custom-langchain-openai-model" class="md-nav__link"> |
| 735 | + <a href="#run-custom-model" class="md-nav__link"> |
736 | 736 | <span class="md-ellipsis"> |
737 | | - Run custom langchain OpenAI model |
| 737 | + Run custom model |
738 | 738 | </span> |
739 | 739 | </a> |
740 | 740 |
|
|
778 | 778 |
|
779 | 779 | <h1>Custom models</h1> |
780 | 780 |
|
781 | | -<p>Note that small local models tend to trim long outputs and could require more careful tuning of data description. </p> |
782 | | -<h2 id="run-custom-langchain-openai-model">Run custom langchain OpenAI model</h2> |
783 | | -<p>You can instantiate <code>Parsera</code> with any chat model supported by LangChain, for example, to run the model from Azure:<br /> |
| 781 | +<p>All custom models are run with <a href="/features/extractors/#chunks-tabular-extractor"><code>ChunksTabularExtractor</code></a>, |
| 782 | +if you want custom extractor you need to initialize it with model of your choice.</p> |
| 783 | +<p>Note that small local models tend to trim long outputs and could require more careful tuning of data description.</p> |
| 784 | +<h2 id="run-custom-model">Run custom model</h2> |
| 785 | +<p>You can instantiate <code>Parsera</code> with any chat model supported by LangChain, for example, to run <code>gpt-4o-mini</code> from OpenAI API:<br /> |
784 | 786 | <div class="language-python highlight"><pre><span></span><code><span id="__span-0-1"><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a><span class="kn">import</span><span class="w"> </span><span class="nn">os</span> |
785 | | -</span><span id="__span-0-2"><a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a><span class="kn">from</span><span class="w"> </span><span class="nn">langchain_openai</span><span class="w"> </span><span class="kn">import</span> <span class="n">AzureChatOpenAI</span> |
| 787 | +</span><span id="__span-0-2"><a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a><span class="kn">from</span><span class="w"> </span><span class="nn">langchain_openai</span><span class="w"> </span><span class="kn">import</span> <span class="n">ChatOpenAI</span> |
786 | 788 | </span><span id="__span-0-3"><a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a> |
787 | | -</span><span id="__span-0-4"><a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a><span class="n">llm</span> <span class="o">=</span> <span class="n">AzureChatOpenAI</span><span class="p">(</span> |
788 | | -</span><span id="__span-0-5"><a id="__codelineno-0-5" name="__codelineno-0-5" href="#__codelineno-0-5"></a> <span class="n">azure_endpoint</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"AZURE_GPT_BASE_URL"</span><span class="p">),</span> |
789 | | -</span><span id="__span-0-6"><a id="__codelineno-0-6" name="__codelineno-0-6" href="#__codelineno-0-6"></a> <span class="n">openai_api_version</span><span class="o">=</span><span class="s2">"2023-05-15"</span><span class="p">,</span> |
790 | | -</span><span id="__span-0-7"><a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a> <span class="n">deployment_name</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"AZURE_GPT_DEPLOYMENT_NAME"</span><span class="p">),</span> |
791 | | -</span><span id="__span-0-8"><a id="__codelineno-0-8" name="__codelineno-0-8" href="#__codelineno-0-8"></a> <span class="n">openai_api_key</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="s2">"AZURE_GPT_API_KEY"</span><span class="p">),</span> |
792 | | -</span><span id="__span-0-9"><a id="__codelineno-0-9" name="__codelineno-0-9" href="#__codelineno-0-9"></a> <span class="n">openai_api_type</span><span class="o">=</span><span class="s2">"azure"</span><span class="p">,</span> |
793 | | -</span><span id="__span-0-10"><a id="__codelineno-0-10" name="__codelineno-0-10" href="#__codelineno-0-10"></a> <span class="n">temperature</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> |
794 | | -</span><span id="__span-0-11"><a id="__codelineno-0-11" name="__codelineno-0-11" href="#__codelineno-0-11"></a><span class="p">)</span> |
795 | | -</span><span id="__span-0-12"><a id="__codelineno-0-12" name="__codelineno-0-12" href="#__codelineno-0-12"></a> |
796 | | -</span><span id="__span-0-13"><a id="__codelineno-0-13" name="__codelineno-0-13" href="#__codelineno-0-13"></a><span class="n">url</span> <span class="o">=</span> <span class="s2">"https://github.com/raznem/parsera"</span> |
797 | | -</span><span id="__span-0-14"><a id="__codelineno-0-14" name="__codelineno-0-14" href="#__codelineno-0-14"></a><span class="n">elements</span> <span class="o">=</span> <span class="p">{</span> |
798 | | -</span><span id="__span-0-15"><a id="__codelineno-0-15" name="__codelineno-0-15" href="#__codelineno-0-15"></a> <span class="s2">"Stars"</span><span class="p">:</span> <span class="s2">"Number of stars"</span><span class="p">,</span> |
799 | | -</span><span id="__span-0-16"><a id="__codelineno-0-16" name="__codelineno-0-16" href="#__codelineno-0-16"></a> <span class="s2">"Fork"</span><span class="p">:</span> <span class="s2">"Number of forks"</span><span class="p">,</span> |
800 | | -</span><span id="__span-0-17"><a id="__codelineno-0-17" name="__codelineno-0-17" href="#__codelineno-0-17"></a><span class="p">}</span> |
801 | | -</span><span id="__span-0-18"><a id="__codelineno-0-18" name="__codelineno-0-18" href="#__codelineno-0-18"></a><span class="n">scrapper</span> <span class="o">=</span> <span class="n">Parsera</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">llm</span><span class="p">)</span> |
802 | | -</span><span id="__span-0-19"><a id="__codelineno-0-19" name="__codelineno-0-19" href="#__codelineno-0-19"></a><span class="n">result</span> <span class="o">=</span> <span class="n">scrapper</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="n">url</span><span class="p">,</span> <span class="n">elements</span><span class="o">=</span><span class="n">elements</span><span class="p">)</span> |
| 789 | +</span><span id="__span-0-4"><a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a><span class="n">llm</span> <span class="o">=</span> <span class="n">ChatOpenAI</span><span class="p">(</span> |
| 790 | +</span><span id="__span-0-5"><a id="__codelineno-0-5" name="__codelineno-0-5" href="#__codelineno-0-5"></a> <span class="n">model</span><span class="o">=</span><span class="s2">"gpt-4o-mini"</span><span class="p">,</span> |
| 791 | +</span><span id="__span-0-6"><a id="__codelineno-0-6" name="__codelineno-0-6" href="#__codelineno-0-6"></a> <span class="n">temperature</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> |
| 792 | +</span><span id="__span-0-7"><a id="__codelineno-0-7" name="__codelineno-0-7" href="#__codelineno-0-7"></a> <span class="n">timeout</span><span class="o">=</span><span class="mi">120</span><span class="p">,</span> |
| 793 | +</span><span id="__span-0-8"><a id="__codelineno-0-8" name="__codelineno-0-8" href="#__codelineno-0-8"></a><span class="p">)</span> |
| 794 | +</span><span id="__span-0-9"><a id="__codelineno-0-9" name="__codelineno-0-9" href="#__codelineno-0-9"></a> |
| 795 | +</span><span id="__span-0-10"><a id="__codelineno-0-10" name="__codelineno-0-10" href="#__codelineno-0-10"></a><span class="n">url</span> <span class="o">=</span> <span class="s2">"https://github.com/raznem/parsera"</span> |
| 796 | +</span><span id="__span-0-11"><a id="__codelineno-0-11" name="__codelineno-0-11" href="#__codelineno-0-11"></a><span class="n">elements</span> <span class="o">=</span> <span class="p">{</span> |
| 797 | +</span><span id="__span-0-12"><a id="__codelineno-0-12" name="__codelineno-0-12" href="#__codelineno-0-12"></a> <span class="s2">"Stars"</span><span class="p">:</span> <span class="s2">"Number of stars"</span><span class="p">,</span> |
| 798 | +</span><span id="__span-0-13"><a id="__codelineno-0-13" name="__codelineno-0-13" href="#__codelineno-0-13"></a> <span class="s2">"Fork"</span><span class="p">:</span> <span class="s2">"Number of forks"</span><span class="p">,</span> |
| 799 | +</span><span id="__span-0-14"><a id="__codelineno-0-14" name="__codelineno-0-14" href="#__codelineno-0-14"></a><span class="p">}</span> |
| 800 | +</span><span id="__span-0-15"><a id="__codelineno-0-15" name="__codelineno-0-15" href="#__codelineno-0-15"></a><span class="n">scrapper</span> <span class="o">=</span> <span class="n">Parsera</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="n">llm</span><span class="p">)</span> |
| 801 | +</span><span id="__span-0-16"><a id="__codelineno-0-16" name="__codelineno-0-16" href="#__codelineno-0-16"></a><span class="n">result</span> <span class="o">=</span> <span class="n">scrapper</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="n">url</span><span class="p">,</span> <span class="n">elements</span><span class="o">=</span><span class="n">elements</span><span class="p">)</span> |
803 | 802 | </span></code></pre></div></p> |
804 | 803 | <h2 id="run-local-model-with-ollama">Run local model with <code>Ollama</code></h2> |
805 | 804 | <p>First, you should install and run <code>ollama</code> in your local environment: <a href="https://github.com/ollama/ollama?tab=readme-ov-file#ollama">official installation guide</a>. |
|
0 commit comments