Skip to content

What is Vertex AI and Gemini AI?

What is Vertext AI

Vertex AI is a fully-managed, unified AI development platform for building and using generative AI.

Integrate Gemini on Vertex AI with Firebase SDKs in iOS apps

Step1: Set up "Build with Gemini" in your Firebase project

  1. In the Firebase console, go to the Build with Gemini page.
build with gemini
  1. Click the second card - Build AI-powered apps with the Gemini API to do the following tasks.

    • Upgrade your project to use the Blaze pay-as-you-go pricing plan.
    • Enable the following two APIs for your project: aiplatform.googleapis.com and firebaseml.googleapis.com

    build with gemini ios guideline

  2. Continue to the next step in this guide to add the SDK to your app.

Step2: Add the FirebaseVertexAI-Preview Library in the Firebase SDK

  1. In Xcode, with your app project open, navigate to File > Add Package Dependencies.
  2. Input the Firebase SDK repository: https://github.com/firebase/firebase-ios-sdk and click Add Package.
  3. Select the FirebaseVertexAI-Preview library anc click Add Package.

firebaseVertextAI-Preview-library

Step3: Call the Vertex AI Gemini API

swift
import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Initialize the generative model with a model that supports your use case
// Gemini 1.5 models are versatile and can be used with all API capabilities
let model = vertex.generativeModel(modelName: "gemini-1.5-flash-preview-0514")

// Provide a prompt that contains text
let prompt = "Write a story about a magic backpack."

// To generate text output, call generateContent with the text input
let response = try await model.generateContent(prompt)
if let text = response.text {
  print(text)
}
import FirebaseVertexAI

// Initialize the Vertex AI service
let vertex = VertexAI.vertexAI()

// Initialize the generative model with a model that supports your use case
// Gemini 1.5 models are versatile and can be used with all API capabilities
let model = vertex.generativeModel(modelName: "gemini-1.5-flash-preview-0514")

// Provide a prompt that contains text
let prompt = "Write a story about a magic backpack."

// To generate text output, call generateContent with the text input
let response = try await model.generateContent(prompt)
if let text = response.text {
  print(text)
}

Gemini on Vertex AI Available models

  • Gemini 1.5 Flash
    • gemini-1.5-flash-001
    • gemini-1.5-flash
    • gemini-1.5-flash-preview-0514
  • Gemini 1.5 Pro
    • gemini-1.5-pro-001
    • gemini-1.5-pro
    • gemini-1.5-pro-preview-0514
    • gemini-1.5-pro-preview-0409
  • Gemini 1.0 Pro Vision
    • gemini-1.0-pro-vision-001
    • gemini-1.0-pro-vision
  • Gemini 1.0 Pro
    • gemini-1.0-pro-002
    • gemini-1.0-pro-001
    • gemini-1.0-pro

Gemini on Vertex AI Prices

ModelPrice (=< 128K conext window)Price (> 128K conext window)
Gemini 1.5 FlashText Input: $0.000125 / 1k characters
Text Output: $0.000375 / 1k characters
Text Input: $0.00025 / 1k characters
Text Output: $0.00075 / 1k characters
Gemini 1.5 ProText Input: $0.00125 / 1k characters
Text Output: $0.00375 / 1k characters
Text Input: $0.0025 / 1k characters
Text Output: $0.0075 / 1k characters
Gemini 1.0 ProText Input: $0.000125 / 1k characters
Text Output: $0.000375 / 1k characters

Gemini on Vertex AI available regions

Google Cloud uses regions to define regional APIs. Google Cloud only stores customer data in the region that you specify for all generally-available features of Generative AI on Vertex AI.

Generative AI on Vertex AI is available in the following regions:

  • United States(7)
    • Dallas, Texas (us-south1)
    • Iowa (us-central1)
    • Moncks Corner, South Carolina (us-east1)
    • Northern Virginia (us-east4)
    • Columbus (us-east5)
    • Oregon (us-west1)
    • Las Vegas, Nevada (us-west4)
  • Canada(1)
    • Montréal (northamerica-northeast1)
  • South America(1)
    • Sao Paulo, Brazil (southamerica-east1)
  • Asia Pacific(7)
    • Changhua County, Taiwan (asia-east1)
    • Hong Kong, China (asia-east2)
    • Mumbai, India (asia-south1)
    • Singapore (asia-southeast1)
    • Sydney, Australia (australia-southeast1)
    • Tokyo, Japan (asia-northeast1)
    • Seoul, Korea (asia-northeast3)
  • Europe(10)
    • Belgium (europe-west1)
    • London, United Kingdom (europe-west2)
    • Frankfurt, Germany (europe-west3)
    • Netherlands (europe-west4)
    • Zürich, Switzerland (europe-west6)
    • Milan, Italy (europe-west8)
    • Paris, France (europe-west9)
    • Finland (europe-north1)
    • Madrid, Spain (europe-southwest1)
    • Warsaw, Poland (europe-central2)
  • Middle-East(1)
    • Tel Aviv, Israel (me-west1)

Model parameters

Top-K

Top-K changes how the model selects tokens for output.

A top-K of 1 means the next selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-K of 3 means that the next token is selected from among the three most probable tokens by using temperature.

For each token selection step, the top-K tokens with the highest probabilities are sampled. Then tokens are further filtered based on top-P with the final token selected using temperature sampling.

Specify a lower value for less random responses and a higher value for more random responses.

Top-P

Top-P changes how the model selects tokens for output.

Tokens are selected from the most (see top-K) to least probable until the sum of their probabilities equals the top-P value.

For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-P value is 0.5, then the model will select either A or B as the next token by using temperature and excludes C as a candidate.

Specify a lower value for less random responses and a higher value for more random responses.

Temperature

The temperature is used for sampling during response generation, which occurs when topP and topK are applied.

Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results.

A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If the model returns a response that's too generic, too short, or the model gives a fallback response, try increasing the temperature.

Valid parameter values

ParameterGemini 1.0 Pro VisionGemini 1.5 ProGemini 1.5 Flash
Top-K1 - 40 (default 32)Not supportedNot supported
Top-P0 - 1.0 (default 1.0)0 - 1.0 (default 0.95)0 - 1.0 (default 0.95)
Temperature0 - 1.0 (default 0.4)0 - 2.0 (default 1.0)0 - 2.0 (default 1.0)

References