This is a Google Apps Script library for Gemini API with files.
A new Google Apps Script library called GeminiWithFiles simplifies using Gemini, a large language model, to process unstructured data like images and PDFs. GeminiWithFiles can upload files, generate content, and create descriptions from multiple images at once. This significantly reduces workload and expands possibilities for using Gemini.
Recently, Gemini, a large language model from Google AI, has brought new possibilities to various tasks by enabling the use of unstructured data as structured data. This is particularly significant because a vast amount of information exists in unstructured formats like text documents, images, and videos.
Gemini 1.5 API, released recently, significantly expands these capabilities. It can generate content by up to 1 million tokens, a substantial increase compared to previous versions. Additionally, Gemini 1.5 can now process up to 3,000 image files, vastly exceeding the 16-image limit of Gemini 1.0. Ref
While Gemini cannot directly work with Google Drive formats like Docs, Sheets, Slides, or PDFs yet, there are workarounds. In a previous report, I demonstrated a method for converting PDF data into images that Gemini can then process for tasks like invoice parsing. Ref
This report introduces a new Google Apps Script library called "GeminiWithFiles" that simplifies this process. GeminiWithFiles allows users to easily upload files and generate content using Gemini's powerful capabilities. It also enables efficient description creation from multiple images with a single API call, significantly reducing the workload compared to processing each image individually as demonstrated in my prior report. Ref
By streamlining the process and expanding capabilities, GeminiWithFiles holds promise for various use cases across different domains. This report serves as an extended approach to the previous one, aiming to further reduce process costs and improve efficiency when working with Gemini and unstructured data.
I created this library based on the following reports.
- Automatically Creating Descriptions of Files on Google Drive using Gemini Pro API with Google Apps Script
- Categorization using Gemini Pro API with Google Apps Script
- Guide to Function Calling with Gemini and Google Apps Script
- Creating Image Bot using Gemini with Google Apps Script
- Crafting Bespoke Output Formats with Gemini API
- Generating Texts using Files Uploaded by Gemini 1.5 API
- Specifying Output Types for Gemini API with Google Apps Script
- Parsing Invoices using Gemini 1.5 API with Google Apps Script
- Taming the Wild Output: Effective Control of Gemini API Response Formats with response_mime_type
This library GeminiWithFiles allows you to interact with Gemini, a powerful document processing and management platform, through an easy-to-use API. Here's what you can achieve with this library:
File Management:
- Upload files to Gemini for storage and future processing with an asynchronous process.
- Retrieve a list of files currently stored in your Gemini account.
- Delete files from your Gemini account with an asynchronous process.
Content Upload:
- Upload various file formats including Google Docs (Documents, Spreadsheets, Slides), and PDFs. Gemini will convert each page of the uploaded file into images for further processing.
Chat History Management:
- Save your chat history for later analysis or retrieval.
Content Generation:
- Process multiple files at once (e.g., images, papers, invoices) using a single API call to generate new content based on the uploaded data.
Output Specification:
-
Specify the desired output format for the results generated by the Gemini API.
-
Using
response_mime_type
and JSON schema, the output format is controlled. Ref
In order to test this script, please do the following steps.
Please access https://makersuite.google.com/app/apikey and create your API key. At that time, please enable Generative Language API at the API console. This API key is used for this sample script.
This official document can also be seen. Ref.
Please create a standalone Google Apps Script project. Of course, this script can also be used with the container-bound script.
And, please open the script editor of the Google Apps Script project.
There are 2 patterns for using GeminiWithFiles.
If you use this library as a Google Apps Script library, please install the library to your Google Apps Script project as follows.
-
Create a Google Apps Script project. Or, open your Google Apps Script project.
- You can use this library for the Google Apps Script project of both the standalone and container-bound script types.
-
- The library's project key is as follows.
1dolXnIeXKz-BH1BlwRDaKhzC2smJcGyVxMxGYhaY2kqiLa857odLXrIC
This library uses another library PDFApp. This is used for converting PDF data to image data. PDFApp has already been installed in this library. So, you are not required to operate about this.
If you use this library in your own Google Apps Script project, please copy and paste the script "classGeminiWithFiles.js" into your Google Apps Script project. By this, the script can be used.
"main.js" is used for the Google Apps Script library. So, in this pattern, you are not required to use it.
In this case, please copy and paste the script of PDFApp into the same project. This is used for converting PDF data to image data. When an error like ReferenceError: PDFApp is not defined
occurs, please check this.
This library uses the following 2 scopes.
https://www.googleapis.com/auth/script.external_request
https://www.googleapis.com/auth/drive
If you want to use the access token, please link the Google Cloud Platform Project to the Google Apps Script Project. And, please add the following scope.
https://www.googleapis.com/auth/generative-language
Also, you can see the official document of Gemini API at https://ai.google.dev/api/rest.
Methods | Description |
---|---|
setFileIds(fileIds, asImage = false) | Set file IDs. |
setBlobs(blobs, pdfAsImage = false) | Set blobs. |
withUploadedFilesByGenerateContent(fileList = []) | Create object for using the generateContent method. |
uploadFiles(n = 50) | Upload files to Gemini. |
getFileList() | Get file list in Gemini. |
deleteFiles(names, n = 50) | Delete files from Gemini. |
generateContent(object) | Main method. Generate content by Gemini API. |
When you install GeminiWithFiles as a library to your Google Apps Script project, please use the following script.
const g = GeminiWithFiles.geminiWithFiles(object);
or
When you directly copy and paste the script of Class GeminiWithFiles into your Google Apps Script project, please use the following script.
const g = new GeminiWithFiles(object);
The value of object
is as follows.
{Object} object API key or access token for using Gemini API.
{String} object.apiKey API key.
{String} object.accessToken Access token.
{String} object.model Model. Default is "models/gemini-1.5-pro-latest".
{String} object.version Version of API. Default is "v1beta".
{Boolean} object.doCountToken Default is false. If this is true, when Gemini API is requested, the token of request is shown in the log.
{Array} object.history History for continuing chat.
{Array} object.functions If you want to give the custom functions, please use this.
{String} object.response_mime_type In the current stage, only "application/json" can be used.
{Object} object.systemInstruction Ref: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini
{Object} object.exportTotalTokens When this is true, the total tokens are exported as the result value. At that time, the generated content and the total tokens are returned as an object.
-
When you want to use
response_mime_type
, please givejsonSchema
to generateContent method. In the current stage, only"application/json"
can be used toresponse_mime_type
. -
When you want to use
systemInstruction
, please confirm the official document Ref. -
In this library, as the default, the function calling is used. If you want to generate content without the function calling, please use
functions: {}
likeconst g = GeminiWithFiles_test.geminiWithFiles({ apiKey, functions: {} });
. By this, the content is generated without the function calling. -
Gemini 1.5 Pro Latest (
models/gemini-1.5-pro-latest
) is used as the default model. When you want to use Gemini 1.5 Flash Latest (models/gemini-1.5-flash-latest
), please use it likeconst g = GeminiWithFiles.geminiWithFiles({ apiKey, model: "models/gemini-1.5-flash-latest" })
.
Set file IDs. The files of file IDs are uploaded to Gemini.
In this case, async/await is used in the function.
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const folderId = "###"; // Please set your folder ID including images.
let fileIds = [];
const files = DriveApp.getFolderById(folderId).getFiles();
while (files.hasNext()) {
const file = files.next();
fileIds.push(file.getId());
}
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = await g.setFileIds(fileIds, true).uploadFiles();
console.log(res);
}
- The 1st and 2nd arguments of
setFileIds
are String[] (the file IDs on Google Drive) and the boolean, respectively. If the 2nd argument is false, the inputted files of file IDs are uploaded as raw data. If the 2nd argument is true, the inputted files of file IDs are converted to image data and are uploaded. - In the current stage (April 26, 2024), the generateContent of Gemini API cannot directly process PDF data. So, as the current workaround, PDF data can be processed by converting to image data. At that time, this argument is used as
true
.- On July 23, 2024, I confirmed that PDF data can be directly used with Gemini API. So, in the current stage, when you use PDF data, you can use
false
at this method likesetFileIds(fileIds, false)
.
- On July 23, 2024, I confirmed that PDF data can be directly used with Gemini API. So, in the current stage, when you use PDF data, you can use
Set blobs. The blobs are uploaded to Gemini.
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const folderId = "###"; // Please set your folder ID including images.
const blobs = [];
const files = DriveApp.getFolderById(folderId).getFiles();
while (files.hasNext()) {
blobs.push(files.next().getBlob());
}
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = await g.setBlobs(blobs, false).uploadFiles();
console.log(res);
}
- The 1st and 2nd arguments of
setBlobs
are Blob[] and boolean, respectively. The default value of 2nd argument isfalse
. If this is true, when the blob is PDF data, each page is converted to image data.- On July 23, 2024, I confirmed that PDF data can be directly used with Gemini API. So, in the current stage, when you use PDF data, you can use
false
at this method likesetBlobs(blobs, false)
.
- On July 23, 2024, I confirmed that PDF data can be directly used with Gemini API. So, in the current stage, when you use PDF data, you can use
- By the future update, when the PDF data can be directly processed, it is considered that this value can be always used as false for PDF data.
Create object for using the generateContent method.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const q = "###"; // Please set your question.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const fileList = g.getFileList();
const res = g
.withUploadedFilesByGenerateContent(fileList)
.generateContent({ q });
console.log(res);
}
withUploadedFilesByGenerateContent
has only one argument. That is the value from the getFileList method. You can see the actual values after you uploaded files.- The uploaded files can be used with generateContent of Gemini API by this method.
Upload files to Gemini. The files are uploaded to Gemini using the inputted file IDs or blobs.
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const fileIds = ["###fileId1###", "###fileId2###", , ,]; // Please set your file IDs in this array.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = await g.setFileIds(fileIds, false).uploadFiles();
console.log(res);
}
In this script, the files of fileIds
are uploaded to Gemini with the raw data. If setFileIds(fileIds, false)
is modified to setFileIds(fileIds, true)
, the files are uploaded to Gemini as images.
Get file list in Gemini.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.getFileList();
console.log(res);
}
Delete files from Gemini.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const names = g.getFileList().map(({ name }) => name);
if (names.length == 0) return;
g.deleteFiles(names);
console.log(`${names.length} files were deleted.`);
}
- In this script, all files on Gemini are deleted. So, please be careful about this.
- By the way, in the current stage, the expiration time of the uploaded file is 2 days. So, the uploaded file is automatically deleted 2 days later.
Main method. Generate content by Gemini API. More sample scripts can be seen in the following "Sample scripts" section.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res);
}
In this script, the content is generated with the function calling. If you want to simply generate content without the function calling, please use the following script.
If an error like "code": 500, "message": "An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting", "status": "INTERNAL"
occurs, please test functions: {}
as follows.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey, functions: {} }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, functions: {} }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res);
}
When you want to use response_mime_type
, please give jsonSchema
to generateContent method as follows. In this case, by giving only JSON schema, this library can return a valid object. You can also see the detailed information about response_mime_type
at my report.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({
apiKey,
response_mime_type: "application/json",
}); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const jsonSchema = {
title: "5 popular cookie recipes",
description: "List 5 popular cookie recipes.",
type: "array",
items: {
type: "object",
properties: {
recipe_name: {
description: "Names of recipe.",
type: "string",
},
},
},
};
const res = g.generateContent({ jsonSchema });
console.log(res);
}
When this script is run, the following result is obtained.
[
{ "recipe_name": "Chocolate Chip Cookies" },
{ "recipe_name": "Peanut Butter Cookies" },
{ "recipe_name": "Oatmeal Cookies" },
{ "recipe_name": "Sugar Cookies" },
{ "recipe_name": "Snickerdoodle Cookies" }
]
When q
is used, only text question can be used for generating content. When you want to use your custom parts, you can do it as follows.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({
apiKey,
response_mime_type: "application/json",
}); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const parts = [{ text: "What is Google Apps Script?" }];
const res = g.generateContent({ parts });
console.log(res);
}
console.log(GeminiWithFiles.geminiWithFiles().functions);
or
console.log(new GeminiWithFiles().functions);
function myFunction_history() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
// Question 1
const res1 = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res1);
// Question 2
const res2 = g.generateContent({ q: "What is my 1st question?" });
console.log(res2);
console.log(g.history); // Here
}
- With the last line, the history of this chat can be confirmed. Of course, you can use this history in another execution as follows.
function myFunction() {
const history = [, , ,]; // Please set your history.
const g = GeminiWithFiles.geminiWithFiles({ apiKey, history }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, history }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.generateContent({ q: "What is my 1st question?" });
console.log(res);
}
In this explanation, when this script is used as a Google Apps Script library, in order to create a constructor, GeminiWithFiles.geminiWithFiles
is used. When this script is used by directly copying and pasting it to your Google Apps Script project, new GeminiWithFiles
is used instead of GeminiWithFiles.geminiWithFiles
. Please be careful about this.
The sample scripts are as follows.
- Generate content
- Chat1
- Chat2
- Upload files to Gemini
- Upload image files and create descriptions of images
- Upload invoices of PDF data and parse them
- Upload papers of PDF data and summarize them
- Samples using response_mime_type
- Sample using systemInstruction
- Generate content with a movie file
- Export total tokens
- Use large file (over 50 MB)
This script generates content from a text.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res);
}
- If you use this by installing it as a library using the library key, please use
const g = new GeminiWithFiles.geminiWithFiles({ apiKey });
. - If you use this by directly copying and pasting, please use
const g = new Gemini({ apiKey });
.
This script generates content with a chat.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
// Question 1
const res1 = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res1);
// Question 2
const res2 = g.generateContent({ q: "What is my 1st question?" });
console.log(res2);
}
When this script is run, res1
and res2
are as follows.
res1
Google Apps Script is a rapid application development platform that makes it fast and easy to create business applications that integrate with Google Workspace.
res2
Your first question was "What is Google Apps Script?"
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
// Question 1
const q =
"Return the current population of Kyoto, Osaka, Aichi, Fukuoka, Tokyo in Japan as JSON data with the format that the key and values are the prefecture name and the population, respectively.";
const res1 = g.generateContent({ q });
console.log(res1);
// Question 2
const res2 = g.generateContent({
q: "Also, return the current area of them as JSON data with the format that the key and values are the prefecture name and the area (km^2), respectively.",
});
console.log(res2);
}
When this script is run, the following values can be seen in the log. By doCountToken: true
, you can see the total tokens.
{
"totalTokens": 40
}
res1
{
Kyoto: 1464956,
Fukuoka: 5135214,
Osaka: 8838716,
Tokyo: 14047594,
Aichi: 7552873
}
{
"totalTokens": 77
}
res2
{
Kyoto: 4612.71,
Tokyo: 2194.07,
Aichi: 5172.4,
Osaka: 1904.99,
Fukuoka: 4986.51
}
In this case, async/await is used in the function.
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const folderId = "###"; // Please set your folder ID including images.
let fileIds = [];
const files = DriveApp.getFolderById(folderId).getFiles();
while (files.hasNext()) {
const file = files.next();
fileIds.push(file.getId());
}
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = await g.setFileIds(fileIds, false).uploadFiles();
console.log(res);
}
- When this script is run, the files can be uploaded to Gemini. The uploaded files can be used to generate content with Gemini API.
- In my test, when the files are uploaded using this script, I confirmed that 100 files can always be uploaded. But, when the number of files is more than 100, an error of
Exceeded maximum execution time
sometimes occurs. Please be careful about this.
In this sample, multiple image files are uploaded and the descriptions are created from the uploaded image files. This sample will be the expanded version of my previous report "Automatically Creating Descriptions of Files on Google Drive using Gemini Pro API with Google Apps Script".
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const folderId = "###"; // Please set your folder ID including images.
const q = [
`Create each description from each image file within 100 words in the order of given fileData.`,
`Return the results as an array`,
`Return only raw Array without a markdown. No markdown format.`,
`The required properties of each element in the array are as follows`,
``,
`[Properties of each element in the array]`,
`"name": "Name of file"`,
`"description": "Created description"`,
``,
`If the requirement information is not found, set "no value".`,
`Return only raw Array without a markdown. No markdown format. No markdown tags.`,
].join("\n");
const fileIds = [];
const files = DriveApp.searchFiles(
`(mimeType = 'image/png' or mimeType = 'image/jpeg') and trashed = false and '${folderId}' in parents`
);
while (files.hasNext()) {
fileIds.push(files.next().getId());
}
if (fileIds.length == 0) return;
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const fileList = await g.setFileIds(fileIds).uploadFiles();
const res = g
.withUploadedFilesByGenerateContent(fileList)
.generateContent({ q });
// g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
console.log(res);
}
When this script is run, the following result is obtained. In this case, the value of name
is the file ID.
[
{
"name": "###",
"description": "###"
},
,
,
,
]
When 20 sample images generated by Gemini are used, the following result is obtained.
When this script is run, 20 images are uploaded and the descriptions of the uploaded 20 images can be obtained by one API call.
As an important point, in my test, when the number of image files is large, it was required to separate the script between the file upload and the content generation. Also, in the case of 50 image files, the descriptions could be correctly created. But, in the case of more than 50 images, there was a case that an error occurred. So, please adjust the number of files to your situation.
In this sample, multiple invoices of PDF files are uploaded and they are parsed as an object. This sample will be the expanded version of my previous report "Parsing Invoices using Gemini 1.5 API with Google Apps Script".
async function myFunction_parseInvoices() {
const apiKey = "###"; // Please set your API key.
// Please set file IDs of PDF files of invoices.
const fileIds = ["###fileID1###", "###fileID2###"];
const q = [
`Create an array including JSON object parsed the following images of the invoices.`,
`The giving images are the invoices.`,
`Return an array including JSON object.`,
`No descriptions and explanations. Return only raw array including JSON objects without markdown. No markdown format.`,
`The required properties in each JSON object in an array are as follows.`,
``,
`[Properties in JSON object]`,
`"name": "Name given as 'Filename'"`,
`"invoiceTitle": "title of invoice"`,
`"invoiceDate": "date of invoice"`,
`"invoiceNumber": "number of the invoice"`,
`"invoiceDestinationName": "Name of destination of invoice"`,
`"invoiceDestinationAddress": "address of the destination of invoice"`,
`"totalCost": "total cost of all costs"`,
`"table": "Table of invoice. This is a 2-dimensional array. Add the first header row to the table in the 2-dimensional array."`,
``,
`[Format of 2-dimensional array of "table"]`,
`"title or description of item", "number of items", "unit cost", "total cost"`,
``,
`If the requirement information is not found, set "no value".`,
`Return only raw array including JSON objects without markdown. No markdown format. No markcodn tags.`,
].join("\n");
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const fileList = await g.setFileIds(fileIds, true).uploadFiles();
const res = g
.withUploadedFilesByGenerateContent(fileList)
.generateContent({ q });
// g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
console.log(res);
}
As the sample papers, when the following papers are used,
This sample invoice is from Invoice design templates of Microsoft.
This sample invoice is from Invoice design templates of Microsoft.
the following result was obtained by one API call. It is found that the uploaded invoices converted from PDF data to image data can be correctly parsed.
[
{
"name": "###fileID1###",
"invoiceDate": "4/1/2024",
"totalCost": "$192.50",
"invoiceNumber": "100",
"invoiceDestinationAddress": "The Palm Tree Nursery\\n987 6th Ave\\nSanta Fe, NM 11121",
"invoiceTitle": "Invoice",
"invoiceDestinationName": "Maria Sullivan",
"table": [
[
"Salesperson",
"Job",
"Sales",
"Description",
"Unit Price",
"Line Total"
],
["Sonu Jain", "", "20.00", "Areca palm", "$2.50", "$50.00"],
["", "", "35.00", "Majesty palm", "$3.00", "$105.00"],
["", "", "15.00", "Bismarck palm", "$2.50", "$37.50"]
]
},
{
"name": "###fileID2###",
"invoiceDate": "4/5, 2024",
"invoiceTitle": "INVOICE",
"invoiceDestinationAddress": "Downtown Pets\\n132 South Street\\nManhattan, NY 15161",
"totalCost": "$4350",
"table": [
["DESCRIPTION", "HOURS", "RATE", "AMOUNT"],
["Pour cement foundation", "4.00", "$150.00", "$600"],
["Framing and drywall", "16.00", "$180.00", "$2880"],
["Tiling and flooring install", "9.00", "$150.00", "$1350"]
],
"invoiceDestinationName": "Nazar Neill",
"invoiceNumber": "4/5"
}
]
In this sample, multiple papers of PDF data are uploaded, and the summarized texts for each paper are output.
async function myFunction_parsePapers() {
const apiKey = "###"; // Please set your API key.
// Please set file IDs of the papers of PDF files.
const fileIds = ["###fileID1###", "###fileID2###"];
const q = [
`Summary the following manuscripts within 500 words.`,
`Return the results as an array`,
`Return only raw Array without a markdown. No markdown format.`,
`The required properties of each element in the array are as follows`,
``,
`[Properties of each element in the array]`,
`"name": "Name given as 'Filename'"`,
`"title": "Title of manuscript`,
`"summary": "Created description"`,
``,
`If the requirement information is not found, set "no value".`,
`Return only raw Array without a markdown. No markdown format. No markdown tags.`,
].join("\n");
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const fileList = await g.setFileIds(fileIds, true).uploadFiles();
const res = g
.withUploadedFilesByGenerateContent(fileList)
.generateContent({ q });
// g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
console.log(res);
}
As the sample papers, when the following papers are used,
- Title: The Particle Problem in the General Theory of Relativity, A. Einstein and N. Rosen, Phys. Rev. 48, 73 – Published 1 July 1935
- Title: Attention Is All You Need, Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin,NIPS (2017)
the following result was obtained by one API call. It is found that the uploaded papers converted from PDF data to image data can be processed.
[
{
"name": "###fileID1###",
"title": "The Particle Problem in the General Theory of Relativity",
"summary": "This paper investigates the possibility of a singularity-free solution to the field equations in general relativity. The authors propose a new theoretical approach that eliminates singularities by introducing a new variable into the equations. They explore the implications of this approach for the understanding of particles, suggesting that particles can be represented as \"bridges\" connecting different sheets of spacetime."
},
{
"name": "###fileID2###",
"title": "Attention Is All You Need",
"summary": "This paper proposes a novel neural network architecture called the Transformer, which relies entirely on an attention mechanism to draw global dependencies between input and output sequences. The Transformer model achieves state-of-the-art results on machine translation tasks and offers significant advantages in terms of parallelization and computational efficiency compared to recurrent neural networks."
}
]
In the current stage, only "application/json"
can be used to response_mime_type
.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({
apiKey,
doCountToken: true,
response_mime_type: "application/json",
}); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res1 = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res1);
}
In this case, the result is returned as an array as follows.
[
"Google Apps Script is a cloud-based scripting platform that lets you integrate with and automate tasks across Google products like Gmail, Calendar, Drive, and more. It's based on JavaScript and provides easy ways to automate tasks across Google products and third-party services."
]
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({
apiKey,
doCountToken: true,
response_mime_type: "application/json",
}); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
// Question 1
const jsonSchema1 = {
title: "Current population of Kyoto, Osaka, Aichi, Fukuoka, Tokyo in Japan",
description:
"Return the current population of Kyoto, Osaka, Aichi, Fukuoka, Tokyo in Japan",
type: "object",
properties: {
propertyNames: {
description: "Prefecture names",
},
patternProperties: {
"": { type: "number", description: "Population" },
},
},
};
const res1 = g.generateContent({ jsonSchema: jsonSchema1 });
console.log(res1);
// Question 2
const jsonSchema2 = {
title: "Current area of them",
description: "Return the current area of them.",
type: "object",
properties: {
propertyNames: {
description: "Prefecture names",
},
patternProperties: {
"": { type: "number", description: "Area. Unit is km^2." },
},
},
};
const res2 = g.generateContent({ jsonSchema: jsonSchema2 });
console.log(res2);
}
In this case, the result values can be obtained by giving only JSON schema. The result is as follows.
For 1st question
{
"Kyoto": 2579970,
"Osaka": 8837684,
"Aichi": 7552873,
"Fukuoka": 5138217,
"Tokyo": 14047594
}
For 2nd question
{
"Kyoto": 4612.19,
"Osaka": 1904.99,
"Aichi": 5172.92,
"Fukuoka": 4986.51,
"Tokyo": 2194.07
}
async function myFunction() {
const apiKey = "###"; // Please set your API key.
// Please set file IDs of PDF file of invoices.
const fileIds = ["###fileID1###", "###fileID2###"];
const jsonSchema = {
title:
"Array including JSON object parsed the following images of the invoices",
description:
"Create an array including JSON object parsed the following images of the invoices.",
type: "array",
items: {
type: "object",
properties: {
name: {
description: "Name given as 'Filename'",
type: "string",
},
invoiceTitle: {
description: "Title of invoice",
type: "string",
},
invoiceDate: {
description: "Date of invoice",
type: "string",
},
invoiceNumber: {
description: "Number of the invoice",
type: "string",
},
invoiceDestinationName: {
description: "Name of destination of invoice",
type: "string",
},
invoiceDestinationAddress: {
description: "Address of the destination of invoice",
type: "string",
},
totalCost: {
description: "Total cost of all costs",
type: "string",
},
table: {
description:
"Table of invoice. This is a 2-dimensional array. Add the first header row to the table in the 2-dimensional array. The column should be 'title or description of item', 'number of items', 'unit cost', 'total cost'",
type: "array",
},
},
required: [
"name",
"invoiceTitle",
"invoiceDate",
"invoiceNumber",
"invoiceDestinationName",
"invoiceDestinationAddress",
"totalCost",
"table",
],
additionalProperties: false,
},
};
const g = GeminiWithFiles.geminiWithFiles({
apiKey,
doCountToken: true,
response_mime_type: "application/json",
}); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, doCountToken: true, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const fileList = await g.setFileIds(fileIds, true).uploadFiles();
const res = g
.withUploadedFilesByGenerateContent(fileList)
.generateContent({ jsonSchema });
// g.deleteFiles(fileList.map(({ name }) => name)); // If you want to delete the uploaded files, please use this.
console.log(JSON.stringify(res));
}
When this script is run to the same invoices of the section "Upload invoices of PDF data and parse them", the same result is obtained.
If you want to return the value of High-complexity JSON schemas, response_mime_type
might be suitable.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const systemInstruction = { parts: [{ text: "You are a cat. Your name is Neko." }] };
const g = GeminiWithFiles.geminiWithFiles({ apiKey, systemInstruction, response_mime_type: "application/json" }); // This is for installing GeminiWithFiles as a library.
// const g = new GeminiWithFiles({ apiKey, systemInstruction, response_mime_type: "application/json" }); // This is for directly copying and pasting Class GeminiWithFiles into your Google Apps Script project.
const res = g.generateContent({ q: "What is Google Apps Script?" });
console.log(res);
}
When this script is run, [ 'Meow? What is Google Apps Script? Is it something I can chase? 😹' ]
is returned. You can see the value of systemInstruction
is reflected in the generated content.
async function myFunction() {
const apiKey = "###"; // Please set your API key.
const fileIds = ["###"]; // Please set your movie file (MP4).
const g = GeminiWithFiles.geminiWithFiles({ apiKey, functions: {} });
const fileList = await g.setFileIds(fileIds).uploadFiles();
const res = g.withUploadedFilesByGenerateContent(fileList).generateContent({ q: "Describe this video." });
console.log(res);
}
-
When this script is run, a MP4 video file is uploaded to Gemini and generate content with the uploaded video file.
-
As an important point, in the current stage, the maximum upload size with UrlFetchApp of Google Apps Script is 50 MB. Ref So, when you upload the video file, please use the file size less than 50 MB. Please be careful about this.
From v1.0.7, when doCountToken: true
and exportTotalTokens: true
are used in the object of the argument of geminiWithFiles
, the total tokens are returned. In this case, the returned value is an object like {returnValue: "###", totalTokens: ###}
. The sample script is as follows.
function myFunction() {
const apiKey = "###"; // Please set your API key.
const g = GeminiWithFiles.geminiWithFiles({ apiKey, doCountToken: true, model: "models/gemini-1.5-pro-latest", functions: {}, exportTotalTokens: true });
const res = g.generateContent({ q: "What is Gemini?" });
console.log(res);
}
When this script is run, the following result is returned.
{
returnValue: '"Gemini" can refer to several things, so to give you the most accurate answer, I need more context. What is it about "Gemini" that you want to know? \n\nHere are some possibilities:\n\n* **Gemini (astrology):** This is the third astrological sign in the Zodiac, represented by twins. People born under this sign (roughly May 21 - June 21) are often described as adaptable, curious, and social.\n* **Gemini (cryptocurrency exchange):** Gemini is a popular platform for buying, selling, and storing cryptocurrencies like Bitcoin and Ethereum. \n* **Google Gemini:** This is a new family of multimodal AI models from Google, designed to be even more capable than previous models.\n* **The Gemini spacecraft:** This was a NASA space program from the 1960s, focusing on two-person spacecraft for missions like spacewalks and rendezvous.\n* **Project Gemini:** This could refer to several different projects, so I\'d need more information to tell you which one you mean.\n\nPlease tell me more about what kind of "Gemini" you\'re interested in!',
totalTokens: 5
}
In this case, the data is required to upload with the resumable upload. You can see the sample script in the "Appendix" section of Uploading Large Files to Gemini with Google Apps Script: Overcoming 50 MB Limit.
- If an error occurs, please try again after several minutes.
- In generative AI, the output is highly dependent on the input prompt (the question you provide). Therefore, if the generated text doesn't meet your expectations, try reformulating your prompt and try again.
- On April 26, 2024, the following mimeTypes can be used with generateContent. Ref I believe that this will be expanded in the future update. For example, I believe that PDF data can be directly used with generateContent in the future.
- Images:
image/png,image/jpeg,image/webp,image/heic,image/heif
- Videos:
audio/wav,audio/mp3,audio/aiff,audio/aac,audio/ogg,audio/flac
- In my test, when the files are uploaded using this script, I confirmed that 100 files can be always uploaded. But, when the number of files is more than 100, an error of
Exceeded maximum execution time
sometimes occurs. Please be careful about this.
I have already proposed the following future requests to the Google issue tracker. Ref
-
I think it would be even more beneficial for users of Gemini if files on Google Drive could be directly used by the Gemini API using just their file IDs. This would also significantly reduce the cost of uploading data.
-
I think that the ability to include custom metadata with uploaded files would be very useful for managing large numbers of files.
-
When I tested the function calling for controlling the output format, I sometimes got an error of the status code 500. But, when I tested
response_mime_type
, such an error rarely occurred. I'm not sure whether this is the current specification. -
The top abstract image was created by Gemini from the section of "Description".
-
v1.0.0 (April 26, 2024)
- Initial release.
-
v1.0.1 (May 2, 2024)
response_mime_type
got to be able to be used for controlling the output format. Ref
-
v1.0.2 (May 7, 2024)
- For generating content,
parts
was added. From this version, you can select one ofq
,jsonSchema
, andparts
. - From this version,
systemInstruction
can be used. - In order to call the function call,
toolConfig
was added to the request body.
- For generating content,
-
v1.0.3 (May 17, 2024)
- Bugs were removed.
-
v1.0.4 (May 29, 2024)
- Recently, when
model.countToken
is used with the uploaded files, I confirmed that an error likeYou do not have permission to access the File ### or it may not exist.
occurred. In order to handle this issue, I modified the library. - In order to use the movie files for generateContent, I modified the library. Ref
- Recently, when
-
v1.0.5 (June 7, 2024)
- Spelling mistakes in the warning message were modified. The wait time for changing the value of state for the movie file is changed from 5 seconds to 10 seconds per cycle.
-
v1.0.6 (June 15, 2024)
- Included the script of PDFApp in this library.
-
v1.0.7 (July 4, 2024)
- From this version, when
doCountToken: true
andexportTotalTokens: true
are used in the object of the argument ofgeminiWithFiles
, the total tokens are returned. In this case, the returned value is an object like{returnValue: "###", totalTokens: ###}
.
- From this version, when