As you maybe aware, we're making massive use of OpenAI services for implementing GenAI features in our process modeling tool Horus Business Modeler Horus software GmbH.
With Oracle making the latest Meta Llama 3.1 models with 405 billion parameters available on OCI, in the context of a joint lecture with Thomas Schuster from the Pforzheim University we wanted to give that a try.
This first blog post shows how to use a Google Colab Python notebook with LangChain to leverage the model running in the Chicago region of OCI.
Setup of OCI environment
In my OCI environment, I first had to subscribe to the Chicago region, since the latest Llama 3.1 models are not yet available in Frankfurt: https://docs.oracle.com/en-us/iaas/Content/generative-ai/pretrained-models.htm#pretrained-models
Or at least they would only be available when provisioning a dedicated AI cluster, which would bust my budget for a PoC.
After subscribing to the Chicago region, you have to make sure that your (or one of your multiple) identity domain is synchronized to Chicago.
I've then created a group "ai-model-users" and a compartment "ai-tests" and attached the following policy:
allow group 'OracleIdentityCloudService'/'ai-model-users' to use generative-ai-chat in compartment ai-tests
With these preparations done, we can create new user in the appropriate Identity Domain and assign it to the ai-model-users group. Then, create an API key and make note of the key (mykey.pem file in this sample) and the config file:
[DEFAULT] user=ocid1.user.oc1..xxxxxxxxxxxxxxxxx fingerprint=33:a9:dd:74:99:17:b9:86:xxxxxxxxx tenancy=ocid1.tenancy.oc1..xxxxxxxxxyyyyy region=eu-frankfurt-1 key_file=/root/.oci/mykey.pem
With this, we're all set with the tenancy preparation.
Testing using the OCI Console (Playground)
Next, we can verify the setup of the user using https://cloud.oracle.com.
Sign in as the new user, switch to the Chicago region and go to "Analytics and AI" - "Generative AI". Under "Chat", select the Llama-3.1-405B model:
Choose the llama-3.1-405b-instruct.model
Then, use some custom or sample text and test if the generation is working.
If it is working, go to to "View Code", select "Python" and copy the code for further usage later on:
# coding: utf-8 # Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. # This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. ########################################################################## # chat_demo.py # Supports Python 3 ########################################################################## # Info: # Get texts from LLM model for given prompts using OCI Generative AI Service. ########################################################################## # Application Command line(no parameter needed) # python chat_demo.py ########################################################################## import oci # Setup basic variables # Auth Config # TODO: Please update config profile name and use the compartmentId that has policies grant permissions for using Generative AI Service compartment_id = "ocid1.tenancy.oc1..XXXXXXX" CONFIG_PROFILE = "DEFAULT" config = oci.config.from_file('~/.oci/config', CONFIG_PROFILE) # Service endpoint endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com" generative_ai_inference_client = oci.generative_ai_inference.GenerativeAiInferenceClient(config=config, service_endpoint=endpoint, retry_strategy=oci.retry.NoneRetryStrategy(), timeout=(10,240)) chat_detail = oci.generative_ai_inference.models.ChatDetails() content = oci.generative_ai_inference.models.TextContent() content.text = "Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts." message = oci.generative_ai_inference.models.Message() message.role = "USER" message.content = [content] chat_request = oci.generative_ai_inference.models.GenericChatRequest() chat_request.api_format = oci.generative_ai_inference.models.BaseChatRequest.API_FORMAT_GENERIC chat_request.messages = [message] chat_request.max_tokens = 600 chat_request.temperature = 1 chat_request.frequency_penalty = 0 chat_request.presence_penalty = 0 chat_request.top_p = 0.75 chat_request.top_k = -1 chat_detail.serving_mode = oci.generative_ai_inference.models.OnDemandServingMode(model_id="ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyarleil5jr7k2rykljkhapnvhrqvzx4cwuvtfedlfxet4q") chat_detail.chat_request = chat_request chat_detail.compartment_id = compartment_id chat_response = generative_ai_inference_client.chat(chat_detail) # Print result print("**************************Chat Result**************************") print(vars(chat_response))
That concludes the testing in the OCI console.
Preparation of a notebook
Next, we head to your favorite Python notebook. For the demo here, I have chosen the Google Colab tool over there: https://colab.research.google.com
First of all, we need to create the ~/.oci/config file. A universal way to do this is:
import os os.makedirs("/root/.oci" , exist_ok=True) f = open("/root/.oci/config", "w") f.write("[DEFAULT]\n") f.write("user=ocid1.user.oc1..ZZZZZ\n") f.write("fingerprint=33:a9:dd:74:99:17:b9:86:b3XXXXXX\n") f.write("tenancy=ocid1.tenancy.oc1..YYYYY\n") f.write("region=eu-frankfurt-1\n") f.write("key_file=/root/.oci/mykey.pem\n") f.close() #open and read the file after the appending: f = open("/root/.oci/config", "r") print(f.read()) # create the mykey.pem file f = open("/root/.oci/mykey.pem", "w") pem_prefix = '-----BEGIN RSA PRIVATE KEY-----\n' pem_suffix = '\n-----END RSA PRIVATE KEY-----' key = "MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC7MXZN+yvIQpofzTrUHUeBQaV9H9irfCvn5N5n2bF8RxDDTKps0qp4bGGS8PTPMymNrKqo3AYZB0F7UzINaOPsHqOnL/XXXXXXXQF" # The content of your private key f.write(pem_prefix) f.write(key) f.write(pem_suffix) f.close()
When running this, it should create you the .oci folder, the config file as well as the private key. You may need to adjust the folder to point to your user's home directory; for Google colab, it is "/root".
Test using the OCI sample code
With these preparations in place, you can perform the testing using the above sample code. For this, though, you first have to install the oci pip module into the notebook. With Google, that works as follows:
!pip install -U oci
Then, the unmodified code from the sample should work and should give an output such as:
*************************Chat Result************************** {'status': 200, 'headers': {'content-type': 'application/json', 'opc-request-id': 'XXXXX', 'content-encoding': 'gzip', 'content-length': '1400'}, 'data': { "chat_response": { "api_format": "GENERIC", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": { "text_offset": null, "token_logprobs": null, "tokens": null, "top_logprobs": null }, "message": { "content": [ { "text": "**Introducing the OmniMic: Revolutionizing Audio Recording with Surround Sound**\n\nAre you a musician, podcaster, or content creator looking for a high-quality, easy-to-use microphone that can capture immersive audio? Look no further than the OmniMic, a compact USB-connected microphone that records stunning surround sound.\n\n**Immersive Audio at Your Fingertips**\n\nThe OmniMic is designed to capture 360-degree audio, allowing you to record music, conversations, and podcasts with unparalleled depth and clarity. Its advanced technology ensures that every nuance of sound is picked up, from the subtlest whisper to the loudest instrument.\n\n**Perfect for Musicians**\n\nWith the OmniMic, musicians can record live performances or rehearsals with ease. Capture the energy of your band's performance or create professional-sounding demos without breaking the bank. The microphone's compact size makes it perfect for recording in small spaces or on-the-go.\n\n**Ideal for Podcasters**\n\nTake your podcast to the next level with the OmniMic's crystal-clear audio. Record interviews, panel discussions, or solo episodes with confidence, knowing that every word will be captured in stunning detail. The microphone's plug-and-play design makes it easy to set up and start recording in minutes.\n\n**Key Features:**\n\n* Compact design (only 3 inches in diameter)\n* USB connectivity for easy plug-and-play use\n* Records surround sound (360-degree audio)\n* High-quality condenser capsules for clear and detailed sound\n* Compatible with Mac and PC\n* Includes carrying case and windscreen\n\n**Benefits:**\n\n* Easy to use: simply plug in and start recording\n* High-quality audio: captures every nuance of sound\n* Versatile: perfect for music recording, podcasting, voiceovers, and more\n* Portable: take it anywhere and record on-the-go\n\n**Who Can Benefit from the OmniMic?**\n\n* Musicians (solo artists or bands)\n* Podcasters (interview-style or solo episodes)\n* Voiceover artists\n* Content creators (YouTube videos, online courses)\n* Anyone looking for high-quality audio recording capabilities\n\n**Get Ready to Elevate Your Audio Game**\n\nExperience the power of surround sound recording with the OmniMic. Order now and discover a new world of immersive audio possibilities.\n\n**Pricing:** $199 (includes carrying case and windscreen)\n\n**Order Now:** Visit our website or visit your local music store today!", "type": "TEXT" } ], "name": null, "role": "ASSISTANT" } } ], "time_created": "2024-11-03T12:27:17.611000+00:00" }, "model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.amaaaaaask7dceyarleil5jr7k2rykljkhapnvhrqvzx4cwuvtfedlfxet4q", "model_version": "1.0.0" }}
This concludes the verification of the OCI setup, now we can easily set up a Langchain chain.
Setting up a Langchain chain
With the basic setup in place, we can now install and make use of the oci Langchain module:
!pip install -U oci langchain-community
With that, you can run the samples from the Langchain documentation, e.g.:
from langchain_community.chat_models import ChatOCIGenAI llm = ChatOCIGenAI( # model_id="cohere.command-r-16k", model_id="meta.llama-3.1-405b-instruct", service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com", compartment_id=compartment_id, model_kwargs={"temperature": 0, "max_tokens": 500}, ) response = llm.invoke("Tell me one fact about earth", temperature=0.7) print(response)
This should give you something like:
content="Here's one fact about Earth:\n\nApproximately 71% of the Earth's surface is covered in water, with the majority of it being oceans." additional_kwargs={'finish_reason': 'stop', 'time_created': '2024-11-03 12:38:56.523000+00:00'} response_metadata={'model_id': 'meta.llama-3.1-405b-instruct', 'model_version': '1.0.0', 'request_id': 'XXXX', 'content-length': '324', 'finish_reason': 'stop', 'time_created': '2024-11-03 12:38:56.523000+00:00'} id='run-0171543d-08fb-4ad9-a561-2aab01ed5ce4-0'
Summary
By using the above procedure, it is possible to run the latest Meta Llama 3.1 model with 405 billion parameters. The model provides a context window of 128K Tokens.
The costs are quite affordable as well: $0.0267 per 10K Transactions/Tokens.
I'm looking forward to seeing how #Horus process model generation compares between the Meta and the OpenAI models. If the results are similarly impressive as the process models we generated with OpenAI's gpt-4-o model, this might be a cost saving and - once the service is GA in Frankfurt especially - data privacy friendly option.