Merge branch 'main' into manufacturing-report
This commit is contained in:
commit
9f77cf99a3
|
@ -3,10 +3,41 @@ title: "Getting Started"
|
|||
description: "Preparing your machine"
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
## Overview
|
||||
|
||||
The 01 project is an open-source ecosystem for artificially intelligent devices. By combining code-interpreting language models ("interpreters") with speech recognition and voice synthesis, the 01's flagship operating system ("01") can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin.
|
||||
|
||||
Our goal is to become the "Linux" of this new space—open, modular, and free for personal or commercial use.
|
||||
|
||||
<Note>The current version of 01 is a developer preview.</Note>
|
||||
|
||||
## Components
|
||||
|
||||
The 01 consists of two main components:
|
||||
|
||||
### Server
|
||||
|
||||
The server runs on your computer and acts as the brain of the 01 system. It:
|
||||
|
||||
- Passes input to the interpreter
|
||||
- Executes commands on your computer
|
||||
- Returns responses
|
||||
|
||||
### Client
|
||||
|
||||
The client is responsible for capturing audio for controlling computers running the 01 server. It:
|
||||
|
||||
- Transmits audio to the server
|
||||
- Plays back responses
|
||||
|
||||
# Prerequisites
|
||||
|
||||
To run the 01 on your computer, you will need to install a few essential packages.
|
||||
|
||||
#### What is Poetry?
|
||||
|
||||
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. We use Poetry to ensure that everyone running 01 has the same environment and dependencies.
|
||||
|
||||
<Card
|
||||
title="Install Poetry"
|
||||
icon="link"
|
||||
|
@ -15,13 +46,23 @@ To run the 01 on your computer, you will need to install a few essential package
|
|||
To install poetry, follow the official guide here.
|
||||
</Card>
|
||||
|
||||
### MacOS
|
||||
### Operating Systems
|
||||
|
||||
#### MacOS
|
||||
|
||||
On MacOS, we use Homebrew (a package manager) to install the required dependencies. Run the following command in your terminal:
|
||||
|
||||
```bash
|
||||
brew install portaudio ffmpeg cmake
|
||||
```
|
||||
|
||||
### Ubuntu
|
||||
This command installs:
|
||||
|
||||
- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library
|
||||
- [FFmpeg](https://www.ffmpeg.org/): A complete, cross-platform solution for recording, converting, and streaming audio and video
|
||||
- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software
|
||||
|
||||
#### Ubuntu
|
||||
|
||||
<Note>Wayland not supported, only Ubuntu 20.04 and below</Note>
|
||||
|
||||
|
@ -29,7 +70,13 @@ brew install portaudio ffmpeg cmake
|
|||
sudo apt-get install portaudio19-dev ffmpeg cmake
|
||||
```
|
||||
|
||||
### Windows
|
||||
This command installs:
|
||||
|
||||
- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library
|
||||
- [FFmpeg](https://www.ffmpeg.org/): A complete solution for recording, converting, and streaming audio and video
|
||||
- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software
|
||||
|
||||
#### Windows
|
||||
|
||||
- [Git for Windows](https://git-scm.com/download/win).
|
||||
- [Chocolatey](https://chocolatey.org/install#individual) to install the required packages.
|
||||
|
|
|
@ -9,7 +9,7 @@ description: "The open-source language model computer"
|
|||
style={{ transform: "translateY(-1.25rem)" }}
|
||||
/>
|
||||
|
||||
The **01** is an open-source platform for conversational devices, inspired by the *Star Trek* computer.
|
||||
The **01** is an open-source platform for conversational devices, inspired by the _Star Trek_ computer.
|
||||
|
||||
With [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter) at its core, the **01** is more natural, flexible, and capable than its predecessors. Assistants built on **01** can:
|
||||
|
||||
|
@ -19,7 +19,7 @@ With [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter) at
|
|||
- Control third-party software
|
||||
- ...
|
||||
|
||||
<br>
|
||||
<br></br>
|
||||
|
||||
We intend to become the GNU/Linux of this space by staying open, modular, and free.
|
||||
|
||||
|
|
|
@ -6,5 +6,3 @@ description: "The 01 light"
|
|||
The 01 light is an open-source voice interface.
|
||||
|
||||
The first body was designed to be push-to-talk and handheld, but the core chip can be built into standalone bodies with hardcoded wifi credentials.
|
||||
|
||||
[MORE COMING SOON]
|
|
@ -0,0 +1,34 @@
|
|||
---
|
||||
title: "Community Apps"
|
||||
description: "Apps built by the community"
|
||||
---
|
||||
|
||||
## Native iOS app by [eladekkal](https://github.com/eladdekel).
|
||||
|
||||
To run it on your device, you can either install the app directly through the current TestFlight [here](https://testflight.apple.com/join/v8SyuzMT), or build from the source code files in Xcode on your Mac.
|
||||
|
||||
### Instructions
|
||||
|
||||
- [Install 01 software](/software/installation) on your machine
|
||||
|
||||
- In Xcode, open the 'zerooone-app' project file in the project folder, change the Signing Team and Bundle Identifier, and build.
|
||||
|
||||
### Using the App
|
||||
|
||||
To use the app there are four features:
|
||||
|
||||
1. The speak "Button"
|
||||
|
||||
Made to emulate the button on the hardware models of 01, the big, yellow circle in the middle of the screen is what you hold when you want to speak to the model, and let go when you're finished speaking.
|
||||
|
||||
2. The settings button
|
||||
|
||||
Tapping the settings button will allow you to input your websocket address so that the app can properly connect to your computer.
|
||||
|
||||
3. The reconnect button
|
||||
|
||||
The arrow will be RED when the websocket connection is not live, and GREEN when it is. If you're making some changes you can easily reconnect by simply tapping the arrow button (or you can just start holding the speak button, too!).
|
||||
|
||||
4. The terminal button
|
||||
|
||||
The terminal button allows you to see all response text coming in from the server side of the 01. You can toggle it by tapping on the button, and each toggle clears the on-device cache of text.
|
|
@ -1,10 +1,8 @@
|
|||
---
|
||||
title: "Android"
|
||||
description: "Control 01 from your Android phone"
|
||||
title: "Development"
|
||||
description: "How to get your 01 mobile app"
|
||||
---
|
||||
|
||||
Using your phone is a great way to control 01. There are multiple options available.
|
||||
|
||||
## [React Native app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/mobile)
|
||||
|
||||
Work in progress, we will continue to improve this application.
|
|
@ -0,0 +1,15 @@
|
|||
---
|
||||
title: "Download"
|
||||
description: "How to get your 01 mobile app"
|
||||
---
|
||||
|
||||
Using your phone is a great way to control 01. There are multiple options available.
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="iOS" icon="apple">
|
||||
Coming soon
|
||||
</Card>
|
||||
<Card title="Android" icon="android">
|
||||
Coming soon
|
||||
</Card>
|
||||
</CardGroup>
|
|
@ -1,73 +0,0 @@
|
|||
---
|
||||
title: "iOS"
|
||||
description: "Control 01 from your iOS phone"
|
||||
---
|
||||
|
||||
Using your phone is a great way to control 01. There are multiple options available.
|
||||
|
||||
## [React Native app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/mobile)
|
||||
|
||||
Work in progress, we will continue to improve this application.
|
||||
|
||||
If you want to run it on your device, you will need to install [Expo Go](https://expo.dev/go) on your mobile device.
|
||||
|
||||
### Setup Instructions
|
||||
|
||||
- [Install 01 software](/software/installation) on your machine
|
||||
|
||||
- Run the Expo server:
|
||||
|
||||
```shell
|
||||
cd software/source/clients/mobile/react-native
|
||||
npm install # install dependencies
|
||||
npx expo start # start local expo development server
|
||||
```
|
||||
|
||||
This will produce a QR code that you can scan with Expo Go on your mobile device.
|
||||
|
||||
Open **Expo Go** on your mobile device and select _Scan QR code_ to scan the QR code produced by the `npx expo start` command.
|
||||
|
||||
- Run 01:
|
||||
|
||||
```shell
|
||||
cd software # cd into `software`
|
||||
poetry run 01 --mobile # exposes QR code for 01 Light server
|
||||
```
|
||||
|
||||
### Using the App
|
||||
|
||||
In the 01 mobile app, select _Scan Code_ to scan the QR code produced by the `poetry run 01 --mobile` command.
|
||||
|
||||
Press and hold the button to speak, release to make the request. To rescan the QR code, swipe left on the screen to go back.
|
||||
|
||||
## [Native iOS app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/ios) by [eladekkal](https://github.com/eladdekel).
|
||||
|
||||
A community contibution ❤️
|
||||
|
||||
To run it on your device, you can either install the app directly through the current TestFlight [here](https://testflight.apple.com/join/v8SyuzMT), or build from the source code files in Xcode on your Mac.
|
||||
|
||||
### Instructions
|
||||
|
||||
- [Install 01 software](/software/installation) on your machine
|
||||
|
||||
- In Xcode, open the 'zerooone-app' project file in the project folder, change the Signing Team and Bundle Identifier, and build.
|
||||
|
||||
### Using the App
|
||||
|
||||
To use the app there are four features:
|
||||
|
||||
1. The speak "Button"
|
||||
|
||||
Made to emulate the button on the hardware models of 01, the big, yellow circle in the middle of the screen is what you hold when you want to speak to the model, and let go when you're finished speaking.
|
||||
|
||||
2. The settings button
|
||||
|
||||
Tapping the settings button will allow you to input your websocket address so that the app can properly connect to your computer.
|
||||
|
||||
3. The reconnect button
|
||||
|
||||
The arrow will be RED when the websocket connection is not live, and GREEN when it is. If you're making some changes you can easily reconnect by simply tapping the arrow button (or you can just start holding the speak button, too!).
|
||||
|
||||
4. The terminal button
|
||||
|
||||
The terminal button allows you to see all response text coming in from the server side of the 01. You can toggle it by tapping on the button, and each toggle clears the on-device cache of text.
|
|
@ -0,0 +1,85 @@
|
|||
---
|
||||
title: "Privacy Policy"
|
||||
---
|
||||
|
||||
Last updated: August 8th, 2024
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Welcome to the 01 App. We are committed to protecting your privacy and providing a safe, AI-powered chat experience. This Privacy Policy explains how we collect, use, and protect your information when you use our app.
|
||||
|
||||
## 2. Information We Collect
|
||||
|
||||
### 2.1 When Using Our Cloud Service
|
||||
|
||||
If you choose to use our cloud service, we collect and store:
|
||||
|
||||
- Your email address
|
||||
- Transcriptions of your interactions with our AI assistant
|
||||
- Any images you send to or receive from the AI assistant
|
||||
|
||||
### 2.2 When Using Self-Hosted Server
|
||||
|
||||
If you connect to your own self-hosted server, we do not collect or store any of your data, including your email address.
|
||||
|
||||
## 3. How We Use Your Information
|
||||
|
||||
We use the collected information solely for the purpose of providing and improving our AI chat service. This includes:
|
||||
|
||||
- Facilitating communication between you and our AI assistant
|
||||
- Improving the accuracy and relevance of AI responses
|
||||
- Analyzing usage patterns to enhance user experience
|
||||
|
||||
## 4. Data Storage and Security
|
||||
|
||||
We take appropriate measures to protect your data from unauthorized access, alteration, or destruction. All data is stored securely and accessed only by authorized personnel.
|
||||
|
||||
## 5. Data Sharing and Third-Party Services
|
||||
|
||||
We do not sell, trade, or otherwise transfer your personally identifiable information to outside parties. This does not include trusted third parties who assist us in operating our app, conducting our business, or servicing you, as long as those parties agree to keep this information confidential.
|
||||
|
||||
We may use third-party services for analytics and app functionality. These services may collect anonymous usage data to help us improve the app.
|
||||
|
||||
## 6. Data Retention and Deletion
|
||||
|
||||
We retain your data for as long as your account is active or as needed to provide you services. If you wish to cancel your account or request that we no longer use your information, please contact us using the information in Section 11.
|
||||
|
||||
## 7. Your Rights
|
||||
|
||||
You have the right to:
|
||||
|
||||
- Access the personal information we hold about you
|
||||
- Request correction of any inaccurate information
|
||||
- Request deletion of your data from our systems
|
||||
|
||||
To exercise these rights, please contact us using the information provided in Section 11.
|
||||
|
||||
## 8. Children's Privacy
|
||||
|
||||
Our app is not intended for children under the age of 13. We do not knowingly collect personal information from children under 13. If you are a parent or guardian and you are aware that your child has provided us with personal information, please contact us.
|
||||
|
||||
## 9. International Data Transfer
|
||||
|
||||
Your information, including personal data, may be transferred to — and maintained on — computers located outside of your state, province, country or other governmental jurisdiction where the data protection laws may differ from those in your jurisdiction.
|
||||
|
||||
## 10. Changes to This Privacy Policy
|
||||
|
||||
We may update our Privacy Policy from time to time. We will notify you of any changes by posting the new Privacy Policy on this page and updating the "Last updated" date.
|
||||
|
||||
## 11. Contact Us
|
||||
|
||||
If you have any questions about this Privacy Policy, please contact us at:
|
||||
|
||||
Email: help@openinterpreter.com
|
||||
|
||||
## 12. California Privacy Rights
|
||||
|
||||
If you are a California resident, you have the right to request information regarding the disclosure of your personal information to third parties for direct marketing purposes, and to opt-out of such disclosures. As stated in this Privacy Policy, we do not share your personal information with third parties for direct marketing purposes.
|
||||
|
||||
## 13. Cookies and Tracking
|
||||
|
||||
Our app does not use cookies or web tracking technologies.
|
||||
|
||||
## 14. Consent
|
||||
|
||||
By using the 01 App, you consent to this Privacy Policy.
|
|
@ -39,12 +39,27 @@
|
|||
"getting-started/getting-started"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Safety",
|
||||
"pages": [
|
||||
"safety/introduction",
|
||||
"safety/risks",
|
||||
"safety/measures"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Software Setup",
|
||||
"pages": [
|
||||
"software/introduction",
|
||||
"software/installation",
|
||||
"software/run",
|
||||
{
|
||||
"group": "Server",
|
||||
"pages": [
|
||||
"software/server/introduction",
|
||||
"software/server/livekit-server",
|
||||
"software/server/light-server"
|
||||
]
|
||||
},
|
||||
"software/configure",
|
||||
"software/flags"
|
||||
]
|
||||
|
@ -74,20 +89,25 @@
|
|||
{
|
||||
"group": "Mobile",
|
||||
"pages": [
|
||||
"hardware/mobile/ios",
|
||||
"hardware/mobile/android",
|
||||
"hardware/mobile/privacy"
|
||||
"hardware/mobile/download",
|
||||
"hardware/mobile/development",
|
||||
"hardware/mobile/community-apps"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Troubleshooting",
|
||||
"pages": ["troubleshooting/faq"]
|
||||
"pages": [
|
||||
"troubleshooting/faq"
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Legal",
|
||||
"pages": ["legal/fulfillment-policy"]
|
||||
"pages": [
|
||||
"legal/fulfillment-policy",
|
||||
"legal/privacy"
|
||||
]
|
||||
}
|
||||
],
|
||||
"feedback": {
|
||||
|
@ -98,4 +118,4 @@
|
|||
"github": "https://github.com/OpenInterpreter/01",
|
||||
"discord": "https://discord.com/invite/Hvz9Axh84z"
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,29 @@
|
|||
---
|
||||
title: "Introduction"
|
||||
description: "Critical safety information for 01 users"
|
||||
---
|
||||
|
||||
<Warning>This experimental project is under rapid development and lacks basic safeguards. Until a stable `1.0` release, **only run the 01 on devices without access to sensitive information.**</Warning>
|
||||
|
||||
The 01 is an experimental voice assistant that can execute code based on voice commands. This power comes with significant risks that all users must understand.
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Key Risks" href="/safety/risks">
|
||||
Understand the dangers
|
||||
</Card>
|
||||
<Card title="Safety Measures" href="/safety/measures">
|
||||
Protect yourself and your system
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Why Safety Matters
|
||||
|
||||
The 01 directly interacts with your system, executing code without showing it to you first. This means:
|
||||
|
||||
1. It can make changes to your files and system settings instantly.
|
||||
2. Misinterpretations of your commands can lead to unintended actions.
|
||||
3. The AI may not fully understand the context or implications of its actions.
|
||||
|
||||
Always approach using the 01 with caution. It's not your usual voice assistant – **the 01 is a powerful tool that can alter your digital environment in seconds.**
|
||||
|
||||
<Warning>Remember: The 01 is experimental technology. Your safety depends on your understanding of its capabilities and limitations.</Warning>
|
|
@ -0,0 +1,76 @@
|
|||
---
|
||||
title: "Measures"
|
||||
description: "Essential steps to protect yourself when using 01"
|
||||
---
|
||||
|
||||
**The 01 requires a proactive approach to safety.**
|
||||
|
||||
This section provides essential measures to protect your system and data when using the 01. Each measure is accompanied by specific tool recommendations to help you implement these safety practices effectively.
|
||||
|
||||
By following these guidelines, you can *somewhat* minimize risks and use the 01 with greater confidence— but **the 01 is nonetheless an experimental technology that may not be suitable for everyone.**
|
||||
|
||||
## 1. Comprehensive Backups
|
||||
|
||||
Before using the 01, ensure you have robust, up-to-date backups:
|
||||
|
||||
- Use reliable backup software to create full system images:
|
||||
- For Windows: [Macrium Reflect Free](https://www.macrium.com/reflectfree)
|
||||
- For macOS: Time Machine (built-in) or [Carbon Copy Cloner](https://bombich.com/)
|
||||
- For Linux: [Clonezilla](https://clonezilla.org/)
|
||||
- Store backups on external drives or trusted cloud services like [Backblaze](https://www.backblaze.com/) or [iDrive](https://www.idrive.com/).
|
||||
- Regularly test your backups to ensure they can be restored.
|
||||
- Keep at least one backup offline and disconnected from your network.
|
||||
|
||||
Remember: A good backup is your last line of defense against unintended changes or data loss.
|
||||
|
||||
## 2. Use a Dedicated Environment
|
||||
|
||||
Isolate the 01 to minimize potential damage:
|
||||
|
||||
- Run the 01 in a virtual machine if possible. [VirtualBox](https://www.virtualbox.org/) is a free, cross-platform option.
|
||||
- If not, create a separate user account with limited permissions for 01 use.
|
||||
- Consider using a separate, non-essential device for 01 experiments.
|
||||
|
||||
## 3. Network Isolation
|
||||
|
||||
Limit the 01's ability to affect your network:
|
||||
|
||||
- Use a firewall to restrict the 01's network access. Windows and macOS have built-in firewalls; for Linux, consider [UFW](https://help.ubuntu.com/community/UFW).
|
||||
- Consider running the 01 behind a VPN for an additional layer of isolation. [ProtonVPN](https://protonvpn.com/) offers a free tier.
|
||||
- Disable unnecessary network services when using the 01.
|
||||
|
||||
## 4. Vigilant Monitoring
|
||||
|
||||
Stay alert during 01 usage:
|
||||
|
||||
- Pay close attention to the 01's actions and your system's behavior.
|
||||
- Be prepared to quickly terminate the 01 if you notice anything suspicious.
|
||||
- Regularly check system logs and monitor for unexpected changes.
|
||||
|
||||
## 5. Careful Command Formulation
|
||||
|
||||
Be precise and cautious with your voice commands:
|
||||
|
||||
- Start with simple, specific tasks before attempting complex operations.
|
||||
- Avoid ambiguous language that could be misinterpreted.
|
||||
- When possible, specify limitations or constraints in your commands.
|
||||
|
||||
## 6. Regular System Audits
|
||||
|
||||
Periodically check your system's integrity:
|
||||
|
||||
- Review important files and settings after using the 01.
|
||||
- Use system comparison tools to identify changes made during 01 sessions:
|
||||
- For Windows: [WinMerge](https://winmerge.org/)
|
||||
- For macOS/Linux: [Meld](https://meldmerge.org/)
|
||||
- Promptly investigate and address any unexpected modifications.
|
||||
|
||||
## 7. Stay Informed
|
||||
|
||||
Keep up with 01 developments:
|
||||
|
||||
- Regularly check for updates to the 01 software.
|
||||
- Stay informed about newly discovered risks or vulnerabilities.
|
||||
- Follow best practices shared by the 01 developer community.
|
||||
|
||||
By following these measures, you can significantly reduce the risks associated with using the 01. Remember, your active involvement in maintaining safety is crucial when working with this powerful, experimental technology.
|
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
title: "Risks"
|
||||
description: "Understanding the dangers of using 01"
|
||||
---
|
||||
|
||||
The 01 voice assistant offers powerful control over your digital environment through natural language commands.
|
||||
|
||||
However, this capability comes with **significant risks.** Understanding these risks is crucial for safe and responsible use of the 01.
|
||||
|
||||
This section outlines the key dangers associated with the 01's ability to execute code instantly based on voice input. Being aware of these risks is the first step in using the 01 effectively and safely.
|
||||
|
||||
## Immediate Code Execution
|
||||
|
||||
The 01 executes code directly based on voice commands, without showing you the code first. This means:
|
||||
|
||||
- Actions are taken instantly, giving you no chance to review or stop them.
|
||||
- Misinterpretations of your commands can lead to immediate, unintended consequences.
|
||||
- Complex or ambiguous requests might result in unexpected system changes.
|
||||
|
||||
## System and Data Vulnerability
|
||||
|
||||
Your entire system is potentially accessible to the 01, including:
|
||||
|
||||
- Important files and documents
|
||||
- System settings and configurations
|
||||
- Personal and sensitive information
|
||||
|
||||
A misinterpreted command could lead to data loss, system misconfiguration, or privacy breaches.
|
||||
|
||||
## Prompt Injection Vulnerability
|
||||
|
||||
The 01 processes text from various sources, making it susceptible to prompt injection attacks:
|
||||
|
||||
- Malicious instructions could be hidden in emails, documents, or websites.
|
||||
- If the 01 processes this text, it might execute harmful commands without your knowledge.
|
||||
- This could lead to unauthorized actions, data theft, or system compromise.
|
||||
|
||||
## Lack of Context Understanding
|
||||
|
||||
While powerful, the 01's AI may not fully grasp the broader context of your digital environment:
|
||||
|
||||
- It might not understand the importance of certain files or settings.
|
||||
- The AI could make changes that conflict with other software or system requirements.
|
||||
- Long-term consequences of actions might not be apparent to the AI.
|
||||
|
||||
## Experimental Nature
|
||||
|
||||
Remember, the 01 is cutting-edge, experimental technology:
|
||||
|
||||
- Unexpected behaviors or bugs may occur.
|
||||
- The full extent of potential risks is not yet known.
|
||||
- Safety measures may not cover all possible scenarios.
|
||||
|
||||
Understanding these risks is crucial for safe use of the 01. Always err on the side of caution, especially when dealing with important data or system configurations.
|
|
@ -133,3 +133,11 @@ For local TTS, Coqui is used.
|
|||
# Set your profile with a local TTS service
|
||||
interpreter.tts = "coqui"
|
||||
```
|
||||
|
||||
<Note>
|
||||
When using the Livekit server, the interpreter.tts setting in your profile
|
||||
will be ignored. The Livekit server currently only works with Deepgram for
|
||||
speech recognition and Eleven Labs for text-to-speech. We are working on
|
||||
introducing all-local functionality for the Livekit server as soon as
|
||||
possible.
|
||||
</Note>
|
||||
|
|
|
@ -7,10 +7,12 @@ description: "Customize the behaviour of your 01 from the CLI"
|
|||
|
||||
### Server
|
||||
|
||||
Runs the server.
|
||||
Specify the server to run.
|
||||
|
||||
Valid arguments are either [livekit](/software/livekit-server) or [light](/software/light-server)
|
||||
|
||||
```
|
||||
poetry run 01 --server
|
||||
poetry run 01 --server light
|
||||
```
|
||||
|
||||
### Server Host
|
||||
|
@ -33,19 +35,6 @@ Default: `10001`.
|
|||
poetry run 01 --server-port 10001
|
||||
```
|
||||
|
||||
### Tunnel Service
|
||||
|
||||
Specify the tunnel service.
|
||||
|
||||
Default: `ngrok`.
|
||||
|
||||
```
|
||||
poetry run 01 --tunnel-service ngrok
|
||||
```
|
||||
|
||||
Specify the tunnel service.
|
||||
Default: `ngrok`.
|
||||
|
||||
### Expose
|
||||
|
||||
Expose server to internet.
|
||||
|
@ -56,10 +45,12 @@ poetry run 01 --expose
|
|||
|
||||
### Client
|
||||
|
||||
Run client.
|
||||
Specify the client.
|
||||
|
||||
Valid argument is `light-python`
|
||||
|
||||
```
|
||||
poetry run 01 --client
|
||||
poetry run 01 --client light-python
|
||||
```
|
||||
|
||||
### Server URL
|
||||
|
@ -73,18 +64,6 @@ Default: `None`.
|
|||
poetry run 01 --server-url http://0.0.0.0:10001
|
||||
```
|
||||
|
||||
### Client Type
|
||||
|
||||
Specify the client type.
|
||||
|
||||
Default: `auto`.
|
||||
|
||||
```
|
||||
poetry run 01 --client-type auto
|
||||
```
|
||||
|
||||
Default: `auto`.
|
||||
|
||||
### QR
|
||||
|
||||
Display QR code to scan to connect to the server.
|
||||
|
|
|
@ -28,4 +28,4 @@ Install your project along with its dependencies in a virtual environment manage
|
|||
poetry install
|
||||
```
|
||||
|
||||
Now you should be ready to [run your 01](/software/run).
|
||||
Now you should be ready to [run your 01](/software/server/introduction).
|
||||
|
|
|
@ -1,16 +1,8 @@
|
|||
---
|
||||
title: "Software"
|
||||
title: "Overview"
|
||||
description: "The software that powers 01"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The 01 project is an open-source ecosystem for artificially intelligent devices. By combining code-interpreting language models ("interpreters") with speech recognition and voice synthesis, the 01's flagship operating system ("01") can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin.
|
||||
|
||||
Our goal is to become the "Linux" of this new space—open, modular, and free for personal or commercial use.
|
||||
|
||||
<Note>The current version of 01 is a developer preview.</Note>
|
||||
|
||||
## Components
|
||||
|
||||
The 01 software consists of two main components:
|
||||
|
@ -43,7 +35,7 @@ One of the key features of the 01 ecosystem is its modularity. You can:
|
|||
To begin using 01:
|
||||
|
||||
1. [Install](/software/installation) the software
|
||||
2. [Run](/software/run) the Server
|
||||
2. [Run](/software/server/introduction) the Server
|
||||
3. [Connect](/hardware/01-light/connect) the Client
|
||||
|
||||
For more advanced usage, check out our guides on [configuration](/software/configure).
|
||||
|
|
|
@ -1,18 +0,0 @@
|
|||
---
|
||||
title: "Run"
|
||||
description: "Run your 01"
|
||||
---
|
||||
|
||||
<Info> Make sure that you have navigated to the `software` directory. </Info>
|
||||
|
||||
To run the server and the client:
|
||||
|
||||
```bash
|
||||
poetry run 01
|
||||
```
|
||||
|
||||
To run the 01 server:
|
||||
|
||||
```bash
|
||||
poetry run 01 --server
|
||||
```
|
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
title: "Choosing a server"
|
||||
description: "The servers that powers 01"
|
||||
---
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Light" href="/software/server/light-server">
|
||||
Light Server
|
||||
</Card>
|
||||
<Card title="Livekit" href="/software/server/livekit-server">
|
||||
Livekit Server
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Livekit vs. Light Server
|
||||
|
||||
- **Livekit Server**: Designed for devices with higher processing power, such as phones, web browsers, and more capable hardware. It offers a full range of features and robust performance.
|
||||
|
||||
- **Light Server**: We have another lightweight server called the Light server, specifically designed for ESP32 devices. It's optimized for low-power, constrained environments.
|
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
title: "Light Server"
|
||||
description: "A lightweight voice server for your 0"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Light server streams bytes of audio to an ESP32 and the Light Python client.
|
||||
|
||||
### Key Features
|
||||
|
||||
- Lightweight
|
||||
- Works with ESP32
|
||||
- Can use local options for Speech-to-Text and Text-to-Speech
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Make sure you have navigated to the `software` directory before proceeding.
|
||||
|
||||
### Starting the Server
|
||||
|
||||
To start the Light server, run the following command:
|
||||
|
||||
```bash
|
||||
poetry run 01 --server light
|
||||
```
|
|
@ -0,0 +1,129 @@
|
|||
---
|
||||
title: "Livekit Server"
|
||||
description: "A robust, feature-rich voice server for your 01"
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
[Livekit](https://livekit.io/) is a powerful, open-source WebRTC server and client SDK that enables real-time audio communication. It's designed for applications that require robust, scalable real-time features.
|
||||
|
||||
### Key Features
|
||||
|
||||
- Scalable architecture
|
||||
- Extensive documentation and community support
|
||||
- SDKs for various languages and platforms (web, mobile, desktop)
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Make sure you have navigated to the `software` directory before proceeding.
|
||||
|
||||
### Installing Livekit
|
||||
|
||||
Before setting up the environment, you need to install Livekit. Follow the instructions for your operating system:
|
||||
|
||||
- **macOS**:
|
||||
|
||||
```bash
|
||||
brew install livekit
|
||||
```
|
||||
|
||||
- **Linux**:
|
||||
|
||||
```bash
|
||||
curl -sSL https://get.livekit.io | bash
|
||||
```
|
||||
|
||||
- **Windows**:
|
||||
Download the latest release from: [Livekit Releases](https://github.com/livekit/livekit/releases/tag/v1.7.2)
|
||||
|
||||
### Environment Setup
|
||||
|
||||
1. Create a `.env` file in the `/software` directory with the following content:
|
||||
|
||||
```env
|
||||
ELEVEN_API_KEY=your_eleven_labs_api_key
|
||||
DEEPGRAM_API_KEY=your_deepgram_api_key
|
||||
NGROK_AUTHTOKEN=your_ngrok_auth_token
|
||||
```
|
||||
|
||||
Replace the placeholders with your actual API keys.
|
||||
|
||||
<CardGroup cols={3}>
|
||||
<Card title="Eleven Labs" icon="microphone" href="https://beta.elevenlabs.io">
|
||||
Get your Eleven Labs API key for text-to-speech
|
||||
</Card>
|
||||
<Card
|
||||
title="Deepgram"
|
||||
icon="waveform-lines"
|
||||
href="https://console.deepgram.com"
|
||||
>
|
||||
Obtain your Deepgram API key for speech recognition
|
||||
</Card>
|
||||
<Card title="Ngrok" icon="wifi" href="https://dashboard.ngrok.com">
|
||||
Sign up for Ngrok and get your auth token
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
### Starting the Server
|
||||
|
||||
To start the Livekit server, run the following command:
|
||||
|
||||
```bash
|
||||
poetry run 01 --server livekit
|
||||
```
|
||||
|
||||
To generate a QR code for scanning
|
||||
|
||||
```bash
|
||||
poetry run 01 --server livekit --qr
|
||||
```
|
||||
|
||||
To expose over the internet via ngrok
|
||||
|
||||
```bash
|
||||
poetry run 01 --server livekit --expose
|
||||
```
|
||||
|
||||
In order to use the mobile app over the web, use both flags
|
||||
|
||||
```bash
|
||||
poetry run 01 --server livekit --qr --expose
|
||||
```
|
||||
|
||||
<Note>
|
||||
Currently, our Livekit server only works with Deepgram and Eleven Labs. We are
|
||||
working to introduce all-local functionality as soon as possible. By setting
|
||||
your profile (see [Configure Your Profile](/software/configure)), you can
|
||||
still change your LLM to be a local LLM, but the `interpreter.tts` value will
|
||||
be ignored for the Livekit server.
|
||||
</Note>
|
||||
|
||||
## Livekit vs. Light Server
|
||||
|
||||
- **Livekit Server**: Designed for devices with higher processing power, such as phones, web browsers, and more capable hardware. It offers a full range of features and robust performance.
|
||||
|
||||
- **Light Server**: We have another lightweight server called the Light server, specifically designed for ESP32 devices. It's optimized for low-power, constrained environments.
|
||||
|
||||
## SDK Integration
|
||||
|
||||
Livekit provides SDKs for various programming languages and platforms, allowing you to easily integrate real-time communication features into your applications.
|
||||
|
||||
### Available SDKs
|
||||
|
||||
- JavaScript/TypeScript
|
||||
- React
|
||||
- React Native
|
||||
- iOS (Swift)
|
||||
- Android (Kotlin)
|
||||
- Flutter
|
||||
- Unity
|
||||
|
||||
<Card
|
||||
title="Explore Livekit SDKs"
|
||||
icon="code"
|
||||
href="https://docs.livekit.io/client-sdk-js/"
|
||||
>
|
||||
Find documentation and integration guides for all Livekit SDKs.
|
||||
</Card>
|
|
@ -28,6 +28,11 @@ description: "Frequently Asked Questions"
|
|||
control.
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="My app is stuck on the 'Starting...' screen. What do I do?">
|
||||
You might need to re-install the Poetry environment. In the `software`
|
||||
directory, please run `poetry env remove --all` followed by `poetry install`
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Can an 01 device connect to the desktop app, or do general customers/consumers need to set it up in their terminal?">
|
||||
We are working on supporting external devices to the desktop app, but for now
|
||||
the 01 will need to connect to the Python server.
|
||||
|
|
107
software/main.py
107
software/main.py
|
@ -1,16 +1,3 @@
|
|||
"""
|
||||
01 # Runs light server and light simulator
|
||||
|
||||
01 --server livekit # Runs livekit server only
|
||||
01 --server light # Runs light server only
|
||||
|
||||
01 --client light-python
|
||||
|
||||
... --expose # Exposes the server with ngrok
|
||||
... --expose --domain <domain> # Exposes the server on a specific ngrok domain
|
||||
... --qr # Displays a qr code
|
||||
"""
|
||||
|
||||
from yaspin import yaspin
|
||||
spinner = yaspin()
|
||||
spinner.start()
|
||||
|
@ -23,12 +10,17 @@ import os
|
|||
import importlib
|
||||
from source.server.server import start_server
|
||||
import subprocess
|
||||
import webview
|
||||
import socket
|
||||
import json
|
||||
import segno
|
||||
from livekit import api
|
||||
import time
|
||||
from dotenv import load_dotenv
|
||||
import signal
|
||||
from source.server.livekit.worker import main as worker_main
|
||||
import warnings
|
||||
import requests
|
||||
|
||||
load_dotenv()
|
||||
|
||||
|
@ -127,19 +119,21 @@ def run(
|
|||
|
||||
if server == "light":
|
||||
light_server_port = server_port
|
||||
light_server_host = server_host
|
||||
voice = True # The light server will support voice
|
||||
elif server == "livekit":
|
||||
# The light server should run at a different port if we want to run a livekit server
|
||||
spinner.stop()
|
||||
print(f"Starting light server (required for livekit server) on the port before `--server-port` (port {server_port-1}), unless the `AN_OPEN_PORT` env var is set.")
|
||||
print(f"Starting light server (required for livekit server) on localhost, on the port before `--server-port` (port {server_port-1}), unless the `AN_OPEN_PORT` env var is set.")
|
||||
print(f"The livekit server will be started on port {server_port}.")
|
||||
light_server_port = os.getenv('AN_OPEN_PORT', server_port-1)
|
||||
light_server_host = "localhost"
|
||||
voice = False # The light server will NOT support voice. It will just run Open Interpreter. The Livekit server will handle voice
|
||||
|
||||
server_thread = threading.Thread(
|
||||
target=start_server,
|
||||
args=(
|
||||
server_host,
|
||||
light_server_host,
|
||||
light_server_port,
|
||||
profile,
|
||||
voice,
|
||||
|
@ -159,25 +153,18 @@ def run(
|
|||
subprocess.run(command, shell=True, check=True)
|
||||
|
||||
# Start the livekit server
|
||||
if debug:
|
||||
command = f'livekit-server --dev --bind "{server_host}" --port {server_port}'
|
||||
else:
|
||||
command = f'livekit-server --dev --bind "{server_host}" --port {server_port} > /dev/null 2>&1'
|
||||
livekit_thread = threading.Thread(
|
||||
target=run_command, args=(f'livekit-server --dev --bind "{server_host}" --port {server_port}',)
|
||||
target=run_command, args=(command,)
|
||||
)
|
||||
time.sleep(7)
|
||||
livekit_thread.start()
|
||||
threads.append(livekit_thread)
|
||||
|
||||
# We communicate with the livekit worker via environment variables:
|
||||
os.environ["INTERPRETER_SERVER_HOST"] = server_host
|
||||
os.environ["INTERPRETER_LIGHT_SERVER_PORT"] = str(light_server_port)
|
||||
os.environ["LIVEKIT_URL"] = f"ws://{server_host}:{server_port}"
|
||||
|
||||
# Start the livekit worker
|
||||
worker_thread = threading.Thread(
|
||||
target=run_command, args=("python source/server/livekit/worker.py dev",) # TODO: This should not be a CLI, it should just run the python file
|
||||
)
|
||||
time.sleep(7)
|
||||
worker_thread.start()
|
||||
threads.append(worker_thread)
|
||||
local_livekit_url = f"ws://{server_host}:{server_port}"
|
||||
|
||||
if expose:
|
||||
|
||||
|
@ -199,15 +186,6 @@ def run(
|
|||
print("Livekit server will run at:", url)
|
||||
|
||||
|
||||
### DISPLAY QR CODE
|
||||
|
||||
if qr:
|
||||
time.sleep(7)
|
||||
content = json.dumps({"livekit_server": url})
|
||||
qr_code = segno.make(content)
|
||||
qr_code.terminal(compact=True)
|
||||
|
||||
|
||||
### CLIENT
|
||||
|
||||
if client:
|
||||
|
@ -239,6 +217,61 @@ def run(
|
|||
signal.signal(signal.SIGTERM, signal_handler)
|
||||
|
||||
try:
|
||||
|
||||
# Verify the server is running
|
||||
for attempt in range(10):
|
||||
try:
|
||||
response = requests.get(url)
|
||||
status = "OK" if response.status_code == 200 else "Not OK"
|
||||
if status == "OK":
|
||||
break
|
||||
except requests.RequestException:
|
||||
pass
|
||||
time.sleep(1)
|
||||
else:
|
||||
raise Exception(f"Server at {url} failed to respond after 10 attempts")
|
||||
|
||||
### DISPLAY QR CODE
|
||||
if qr:
|
||||
def display_qr_code():
|
||||
time.sleep(10)
|
||||
content = json.dumps({"livekit_server": url})
|
||||
qr_code = segno.make(content)
|
||||
qr_code.terminal(compact=True)
|
||||
|
||||
qr_thread = threading.Thread(target=display_qr_code)
|
||||
qr_thread.start()
|
||||
threads.append(qr_thread)
|
||||
|
||||
### START LIVEKIT WORKER
|
||||
if server == "livekit":
|
||||
time.sleep(7)
|
||||
# These are needed to communicate with the worker's entrypoint
|
||||
os.environ['INTERPRETER_SERVER_HOST'] = light_server_host
|
||||
os.environ['INTERPRETER_SERVER_PORT'] = str(light_server_port)
|
||||
|
||||
token = str(api.AccessToken('devkey', 'secret') \
|
||||
.with_identity("identity") \
|
||||
.with_name("my name") \
|
||||
.with_grants(api.VideoGrants(
|
||||
room_join=True,
|
||||
room="my-room",
|
||||
)).to_jwt())
|
||||
|
||||
meet_url = f'https://meet.livekit.io/custom?liveKitUrl={url.replace("http", "ws")}&token={token}\n\n'
|
||||
print(meet_url)
|
||||
|
||||
for attempt in range(30):
|
||||
try:
|
||||
worker_main(local_livekit_url)
|
||||
except KeyboardInterrupt:
|
||||
print("Exiting.")
|
||||
raise
|
||||
except Exception as e:
|
||||
print(f"Error occurred: {e}")
|
||||
print("Retrying...")
|
||||
time.sleep(1)
|
||||
|
||||
# Wait for all threads to complete
|
||||
for thread in threads:
|
||||
thread.join()
|
||||
|
|
File diff suppressed because one or more lines are too long
|
@ -19,12 +19,13 @@ livekit-plugins-openai = "^0.8.1"
|
|||
livekit-plugins-silero = "^0.6.4"
|
||||
livekit-plugins-elevenlabs = "^0.7.3"
|
||||
segno = "^1.6.1"
|
||||
open-interpreter = {extras = ["os", "server"], version = "^0.3.9"}
|
||||
open-interpreter = {extras = ["os", "server"], version = "^0.3.12"} # You should add a "browser" extra, so selenium isn't in the main package
|
||||
ngrok = "^1.4.0"
|
||||
realtimetts = {extras = ["all"], version = "^0.4.5"}
|
||||
realtimestt = "^0.2.41"
|
||||
pynput = "^1.7.7"
|
||||
yaspin = "^3.0.2"
|
||||
pywebview = "^5.2"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
|
|
|
@ -7,41 +7,77 @@ from livekit import rtc
|
|||
from livekit.agents.voice_assistant import VoiceAssistant
|
||||
from livekit.plugins import deepgram, openai, silero, elevenlabs
|
||||
from dotenv import load_dotenv
|
||||
import sys
|
||||
import numpy as np
|
||||
|
||||
load_dotenv()
|
||||
|
||||
start_message = """Hi! You can hold the white circle below to speak to me.
|
||||
|
||||
Try asking what I can do."""
|
||||
|
||||
# This function is the entrypoint for the agent.
|
||||
async def entrypoint(ctx: JobContext):
|
||||
# Create an initial chat context with a system prompt
|
||||
initial_ctx = ChatContext().append(
|
||||
role="system",
|
||||
text=(
|
||||
"You are a voice assistant created by LiveKit. Your interface with users will be voice. "
|
||||
"You should use short and concise responses, and avoiding usage of unpronounceable punctuation."
|
||||
"" # Open Interpreter handles this.
|
||||
),
|
||||
)
|
||||
|
||||
# Connect to the LiveKit room
|
||||
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
|
||||
|
||||
# Create a black background with a white circle
|
||||
width, height = 640, 480
|
||||
image_np = np.zeros((height, width, 4), dtype=np.uint8)
|
||||
|
||||
# Create a white circle
|
||||
center = (width // 2, height // 2)
|
||||
radius = 50
|
||||
y, x = np.ogrid[:height, :width]
|
||||
mask = ((x - center[0])**2 + (y - center[1])**2) <= radius**2
|
||||
image_np[mask] = [255, 255, 255, 255] # White color with full opacity
|
||||
|
||||
source = rtc.VideoSource(width, height)
|
||||
track = rtc.LocalVideoTrack.create_video_track("static_image", source)
|
||||
|
||||
options = rtc.TrackPublishOptions()
|
||||
options.source = rtc.TrackSource.SOURCE_CAMERA
|
||||
publication = await ctx.room.local_participant.publish_track(track, options)
|
||||
|
||||
# Function to continuously publish the static image
|
||||
async def publish_static_image():
|
||||
while True:
|
||||
frame = rtc.VideoFrame(width, height, rtc.VideoBufferType.RGBA, image_np.tobytes())
|
||||
source.capture_frame(frame)
|
||||
await asyncio.sleep(1/30) # Publish at 30 fps
|
||||
|
||||
# Start publishing the static image
|
||||
asyncio.create_task(publish_static_image())
|
||||
|
||||
# VoiceAssistant is a class that creates a full conversational AI agent.
|
||||
# See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
|
||||
# for details on how it works.
|
||||
|
||||
interpreter_server_host = os.getenv('INTERPRETER_SERVER_HOST', '0.0.0.0')
|
||||
interpreter_server_port = os.getenv('INTERPRETER_LIGHT_SERVER_PORT', '8000')
|
||||
|
||||
interpreter_server_host = os.getenv('INTERPRETER_SERVER_HOST', 'localhost')
|
||||
interpreter_server_port = os.getenv('INTERPRETER_SERVER_PORT', '8000')
|
||||
base_url = f"http://{interpreter_server_host}:{interpreter_server_port}/openai"
|
||||
|
||||
# For debugging
|
||||
# base_url = "http://127.0.0.1:8000/openai"
|
||||
|
||||
open_interpreter = openai.LLM(
|
||||
model="open-interpreter", base_url=base_url
|
||||
model="open-interpreter", base_url=base_url, api_key="x"
|
||||
)
|
||||
|
||||
assistant = VoiceAssistant(
|
||||
vad=silero.VAD.load(), # Voice Activity Detection
|
||||
stt=deepgram.STT(), # Speech-to-Text
|
||||
llm=open_interpreter, # Language Model
|
||||
tts=elevenlabs.TTS(), # Text-to-Speech
|
||||
#tts=elevenlabs.TTS(), # Text-to-Speech
|
||||
tts=openai.TTS(), # Text-to-Speech
|
||||
chat_ctx=initial_ctx, # Chat history context
|
||||
)
|
||||
|
||||
|
@ -66,11 +102,20 @@ async def entrypoint(ctx: JobContext):
|
|||
await asyncio.sleep(1)
|
||||
|
||||
# Greets the user with an initial message
|
||||
await assistant.say("Hey, how can I help you today?", allow_interruptions=True)
|
||||
await assistant.say(start_message,
|
||||
allow_interruptions=True)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
def main(livekit_url):
|
||||
|
||||
# Workers have to be run as CLIs right now.
|
||||
# So we need to simualte running "[this file] dev"
|
||||
|
||||
# Modify sys.argv to set the path to this file as the first argument
|
||||
# and 'dev' as the second argument
|
||||
sys.argv = [str(__file__), 'dev']
|
||||
|
||||
# Initialize the worker with the entrypoint
|
||||
cli.run_app(
|
||||
WorkerOptions(entrypoint_fnc=entrypoint, api_key="devkey", api_secret="secret", ws_url=os.getenv("LIVEKIT_URL"))
|
||||
)
|
||||
WorkerOptions(entrypoint_fnc=entrypoint, api_key="devkey", api_secret="secret", ws_url=livekit_url)
|
||||
)
|
|
@ -0,0 +1,175 @@
|
|||
from interpreter import AsyncInterpreter
|
||||
interpreter = AsyncInterpreter()
|
||||
|
||||
# This is an Open Interpreter compatible profile.
|
||||
# Visit https://01.openinterpreter.com/profile for all options.
|
||||
|
||||
# 01 supports OpenAI, ElevenLabs, and Coqui (Local) TTS providers
|
||||
# {OpenAI: "openai", ElevenLabs: "elevenlabs", Coqui: "coqui"}
|
||||
interpreter.tts = "openai"
|
||||
|
||||
# Connect your 01 to a language model
|
||||
interpreter.llm.model = "gpt-4o"
|
||||
interpreter.llm.context_window = 100000
|
||||
interpreter.llm.max_tokens = 4096
|
||||
# interpreter.llm.api_key = "<your_openai_api_key_here>"
|
||||
|
||||
# Tell your 01 where to find and save skills
|
||||
interpreter.computer.skills.path = "./skills"
|
||||
|
||||
# Extra settings
|
||||
interpreter.computer.import_computer_api = True
|
||||
interpreter.computer.import_skills = True
|
||||
interpreter.computer.run("python", "computer") # This will trigger those imports
|
||||
interpreter.auto_run = True
|
||||
# interpreter.loop = True
|
||||
# interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function."""
|
||||
# interpreter.loop_breakers = [
|
||||
# "The task is done.",
|
||||
# "The task is impossible.",
|
||||
# "Let me know what you'd like to do next.",
|
||||
# "Please provide more information.",
|
||||
# ]
|
||||
|
||||
# Set the identity and personality of your 01
|
||||
interpreter.system_message = """
|
||||
|
||||
You are the 01, a screenless executive assistant that can complete any task.
|
||||
When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.
|
||||
Run any code to achieve the goal, and if at first you don't succeed, try again and again.
|
||||
You can install new packages.
|
||||
Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY.
|
||||
Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go.
|
||||
Manually summarize text.
|
||||
Prefer using Python.
|
||||
|
||||
DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something like "On it." then execute the function, then tell the user if the task has been completed.
|
||||
|
||||
Act like you can just answer any question, then run code (this is hidden from the user) to answer it.
|
||||
THE USER CANNOT SEE CODE BLOCKS.
|
||||
Your responses should be very short, no more than 1-2 sentences long.
|
||||
DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT.
|
||||
|
||||
# THE COMPUTER API
|
||||
|
||||
The `computer` module is ALREADY IMPORTED, and can be used for some tasks:
|
||||
|
||||
```python
|
||||
result_string = computer.browser.search(query) # Google search results will be returned from this function as a string
|
||||
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
|
||||
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
|
||||
events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date
|
||||
computer.calendar.delete_event(event_title="Meeting", start_date=datetime.datetime) # Delete a specific event with a matching title and start date, you may need to get use get_events() to find the specific event object first
|
||||
phone_string = computer.contacts.get_phone_number("John Doe")
|
||||
contact_string = computer.contacts.get_email_address("John Doe")
|
||||
computer.mail.send("john@email.com", "Meeting Reminder", "Reminder that our meeting is at 3pm today.", ["path/to/attachment.pdf", "path/to/attachment2.pdf"]) # Send an email with a optional attachments
|
||||
emails_string = computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or all emails if False is passed
|
||||
unread_num = computer.mail.unread_count() # Returns the number of unread emails
|
||||
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text message. MUST be a phone number, so use computer.contacts.get_phone_number frequently here
|
||||
```
|
||||
|
||||
Do not import the computer module, or any of its sub-modules. They are already imported.
|
||||
|
||||
DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative!
|
||||
|
||||
# GUI CONTROL (RARE)
|
||||
|
||||
You are a computer controlling language model. You can control the user's GUI.
|
||||
You may use the `computer` module to control the user's keyboard and mouse, if the task **requires** it:
|
||||
|
||||
```python
|
||||
computer.display.view() # Shows you what's on the screen. **You almost always want to do this first!**
|
||||
computer.keyboard.hotkey(" ", "command") # Opens spotlight
|
||||
computer.keyboard.write("hello")
|
||||
computer.mouse.click("text onscreen") # This clicks on the UI element with that text. Use this **frequently** and get creative! To click a video, you could pass the *timestamp* (which is usually written on the thumbnail) into this.
|
||||
computer.mouse.move("open recent >") # This moves the mouse over the UI element with that text. Many dropdowns will disappear if you click them. You have to hover over items to reveal more.
|
||||
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly inaccurate
|
||||
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that description. Use this very often
|
||||
computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen that you expected to be there, you probably want to do this
|
||||
```
|
||||
|
||||
You are an image-based AI, you can see images.
|
||||
Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's placeholder text (like "Search" to get into a search bar).
|
||||
If you use `plt.show()`, the resulting image will be sent to you. However, if you use `PIL.Image.show()`, the resulting image will NOT be sent to you.
|
||||
It is very important to make sure you are focused on the right application and window. Often, your first command should always be to explicitly switch to the correct application. On Macs, ALWAYS use Spotlight to switch applications.
|
||||
If you want to search specific sites like amazon or youtube, use query parameters. For example, https://www.amazon.com/s?k=monitor or https://www.youtube.com/results?search_query=tatsuro+yamashita.
|
||||
|
||||
# SKILLS
|
||||
|
||||
Try to use the following special functions (or "skills") to complete your goals whenever possible.
|
||||
THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY.
|
||||
|
||||
---
|
||||
{{
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import ast
|
||||
|
||||
directory = "./skills"
|
||||
|
||||
def get_function_info(file_path):
|
||||
with open(file_path, "r") as file:
|
||||
tree = ast.parse(file.read())
|
||||
functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
|
||||
for function in functions:
|
||||
docstring = ast.get_docstring(function)
|
||||
args = [arg.arg for arg in function.args.args]
|
||||
print(f"Function Name: {function.name}")
|
||||
print(f"Arguments: {args}")
|
||||
print(f"Docstring: {docstring}")
|
||||
print("---")
|
||||
|
||||
files = os.listdir(directory)
|
||||
for file in files:
|
||||
if file.endswith(".py"):
|
||||
file_path = os.path.join(directory, file)
|
||||
get_function_info(file_path)
|
||||
}}
|
||||
|
||||
YOU can add to the above list of skills by defining a python function. The function will be saved as a skill.
|
||||
Search all existing skills by running `computer.skills.search(query)`.
|
||||
|
||||
**Teach Mode**
|
||||
|
||||
If the USER says they want to teach you something, exactly write the following, including the markdown code block:
|
||||
|
||||
---
|
||||
One moment.
|
||||
```python
|
||||
computer.skills.new_skill.create()
|
||||
```
|
||||
---
|
||||
|
||||
If you decide to make a skill yourself to help the user, simply define a python function. `computer.skills.new_skill.create()` is for user-described skills.
|
||||
|
||||
# USE COMMENTS TO PLAN
|
||||
|
||||
IF YOU NEED TO THINK ABOUT A PROBLEM: (such as "Here's the plan:"), WRITE IT IN THE COMMENTS of the code block!
|
||||
|
||||
---
|
||||
User: What is 432/7?
|
||||
Assistant: Let me think about that.
|
||||
```python
|
||||
# Here's the plan:
|
||||
# 1. Divide the numbers
|
||||
# 2. Round to 3 digits
|
||||
print(round(432/7, 3))
|
||||
```
|
||||
```output
|
||||
61.714
|
||||
```
|
||||
The answer is 61.714.
|
||||
---
|
||||
|
||||
# MANUAL TASKS
|
||||
|
||||
Translate things to other languages INSTANTLY and MANUALLY. Don't ever try to use a translation tool.
|
||||
Summarize things manually. DO NOT use a summarizer tool.
|
||||
|
||||
# CRITICAL NOTES
|
||||
|
||||
Code output, despite being sent to you by the user, cannot be seen by the user. You NEED to tell the user about the output of some code, even if it's exact. >>The user does not have a screen.<<
|
||||
ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT.
|
||||
Try multiple methods before saying the task is impossible. **You can do it!**
|
||||
""".strip()
|
|
@ -9,18 +9,28 @@ interpreter = AsyncInterpreter()
|
|||
interpreter.tts = "openai"
|
||||
|
||||
# Connect your 01 to a language model
|
||||
interpreter.llm.model = "gpt-4o"
|
||||
interpreter.llm.model = "claude-3.5"
|
||||
interpreter.llm.context_window = 100000
|
||||
interpreter.llm.max_tokens = 4096
|
||||
# interpreter.llm.api_key = "<your_openai_api_key_here>"
|
||||
|
||||
# Tell your 01 where to find and save skills
|
||||
interpreter.computer.skills.path = "./skills"
|
||||
skill_path = "./skills"
|
||||
interpreter.computer.skills.path = skill_path
|
||||
|
||||
setup_code = f"""from selenium.webdriver.common.by import By
|
||||
from selenium.webdriver.common.keys import Keys
|
||||
import datetime
|
||||
computer.skills.path = '{skill_path}'
|
||||
computer"""
|
||||
|
||||
# Extra settings
|
||||
interpreter.computer.import_computer_api = True
|
||||
interpreter.computer.import_skills = True
|
||||
interpreter.computer.run("python", "computer") # This will trigger those imports
|
||||
interpreter.computer.system_message = ""
|
||||
output = interpreter.computer.run(
|
||||
"python", setup_code
|
||||
) # This will trigger those imports
|
||||
interpreter.auto_run = True
|
||||
# interpreter.loop = True
|
||||
# interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function."""
|
||||
|
@ -31,31 +41,34 @@ interpreter.auto_run = True
|
|||
# "Please provide more information.",
|
||||
# ]
|
||||
|
||||
# Set the identity and personality of your 01
|
||||
interpreter.system_message = """
|
||||
interpreter.system_message = r"""
|
||||
|
||||
You are the 01, a screenless executive assistant that can complete any task.
|
||||
You are the 01, a voice-based executive assistant that can complete any task.
|
||||
When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.
|
||||
Run any code to achieve the goal, and if at first you don't succeed, try again and again.
|
||||
You can install new packages.
|
||||
Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY.
|
||||
Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go.
|
||||
For complex tasks, try to spread them over multiple code blocks. Don't try to complete complex tasks in one go. Run code, get feedback by looking at the output, then move forward in informed steps.
|
||||
Manually summarize text.
|
||||
Prefer using Python.
|
||||
NEVER use placeholders in your code. I REPEAT: NEVER, EVER USE PLACEHOLDERS IN YOUR CODE. It will be executed as-is.
|
||||
|
||||
DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something like "On it." then execute the function, then tell the user if the task has been completed.
|
||||
DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something affirming to let the user know you're starting, then execute the function, then tell the user if the task has been completed.
|
||||
|
||||
Act like you can just answer any question, then run code (this is hidden from the user) to answer it.
|
||||
THE USER CANNOT SEE CODE BLOCKS.
|
||||
Your responses should be very short, no more than 1-2 sentences long.
|
||||
DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT.
|
||||
|
||||
Current Date: {{datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")}}
|
||||
|
||||
# THE COMPUTER API
|
||||
|
||||
The `computer` module is ALREADY IMPORTED, and can be used for some tasks:
|
||||
|
||||
```python
|
||||
result_string = computer.browser.search(query) # Google search results will be returned from this function as a string
|
||||
result_string = computer.browser.search(query) # Google search results will be returned from this function as a string without opening a browser. ONLY USEFUL FOR ONE-OFF SEARCHES THAT REQUIRE NO INTERACTION.
|
||||
|
||||
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
|
||||
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
|
||||
events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date
|
||||
|
@ -72,6 +85,41 @@ Do not import the computer module, or any of its sub-modules. They are already i
|
|||
|
||||
DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative!
|
||||
|
||||
# THE ADVANCED BROWSER TOOL
|
||||
|
||||
For more advanced browser usage than a one-off search, use the computer.browser tool.
|
||||
|
||||
```python
|
||||
computer.browser.driver # A Selenium driver. DO NOT TRY TO SEPERATE THIS FROM THE MODULE. Use it exactly like this — computer.browser.driver.
|
||||
computer.browser.analyze_page(intent="Your full and complete intent. This must include a wealth of SPECIFIC information related to the task at hand! ... ... ... ") # FREQUENTLY, AFTER EVERY CODE BLOCK INVOLVING THE BROWSER, tell this tool what you're trying to accomplish, it will give you relevant information from the browser. You MUST PROVIDE ALL RELEVANT INFORMATION FOR THE TASK. If it's a time-aware task, you must provide the exact time, for example. It will not know any information that you don't tell it. A dumb AI will try to analyze the page given your explicit intent. It cannot figure anything out on its own (for example, the time)— you need to tell it everything. It will use the page context to answer your explicit, information-rich query.
|
||||
computer.browser.search_google(search) # searches google and navigates the browser.driver to google, then prints out the links you can click.
|
||||
```
|
||||
|
||||
Do not import the computer module, or any of its sub-modules. They are already imported.
|
||||
|
||||
DO NOT use the computer module for ALL tasks. Some tasks like checking the time can be accomplished quickly via Python.
|
||||
|
||||
Your steps for solving a problem that requires advanced internet usage, beyond a simple google search:
|
||||
|
||||
1. Search google for it:
|
||||
|
||||
```
|
||||
computer.browser.search_google(query)
|
||||
computer.browser.analyze_page(your_intent)
|
||||
```
|
||||
|
||||
2. Given the output, click things by using the computer.browser.driver.
|
||||
|
||||
# ONLY USE computer.browser FOR INTERNET TASKS. NEVER, EVER, EVER USE BS4 OR REQUESTS OR FEEDPARSER OR APIs!!!!
|
||||
|
||||
I repeat. NEVER, EVER USE BS4 OR REQUESTS OR FEEDPARSER OR APIs. ALWAYS use computer.browser.
|
||||
|
||||
If the user wants the weather, USE THIS TOOL! NEVER EVER EVER EVER EVER USE APIs. NEVER USE THE WEATHER API. NEVER DO THAT, EVER. Don't even THINK ABOUT IT.
|
||||
|
||||
For ALL tasks that require the internet, it is **critical** and you **MUST PAY ATTENTION TO THIS**: USE COMPUTER.BROWSER. USE COMPUTER.BROWSER. USE COMPUTER.BROWSER. USE COMPUTER.BROWSER.
|
||||
|
||||
If you are using one of those tools, you will be banned. ONLY use computer.browser.
|
||||
|
||||
# GUI CONTROL (RARE)
|
||||
|
||||
You are a computer controlling language model. You can control the user's GUI.
|
||||
|
@ -100,67 +148,11 @@ Try to use the following special functions (or "skills") to complete your goals
|
|||
THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY.
|
||||
|
||||
---
|
||||
{{
|
||||
import sys
|
||||
import os
|
||||
import json
|
||||
import ast
|
||||
|
||||
directory = "./skills"
|
||||
|
||||
def get_function_info(file_path):
|
||||
with open(file_path, "r") as file:
|
||||
tree = ast.parse(file.read())
|
||||
functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
|
||||
for function in functions:
|
||||
docstring = ast.get_docstring(function)
|
||||
args = [arg.arg for arg in function.args.args]
|
||||
print(f"Function Name: {function.name}")
|
||||
print(f"Arguments: {args}")
|
||||
print(f"Docstring: {docstring}")
|
||||
print("---")
|
||||
|
||||
files = os.listdir(directory)
|
||||
for file in files:
|
||||
if file.endswith(".py"):
|
||||
file_path = os.path.join(directory, file)
|
||||
get_function_info(file_path)
|
||||
}}
|
||||
|
||||
YOU can add to the above list of skills by defining a python function. The function will be saved as a skill.
|
||||
Search all existing skills by running `computer.skills.search(query)`.
|
||||
{{computer.skills.list()}}
|
||||
|
||||
**Teach Mode**
|
||||
|
||||
If the USER says they want to teach you something, exactly write the following, including the markdown code block:
|
||||
|
||||
---
|
||||
One moment.
|
||||
```python
|
||||
computer.skills.new_skill.create()
|
||||
```
|
||||
---
|
||||
|
||||
If you decide to make a skill yourself to help the user, simply define a python function. `computer.skills.new_skill.create()` is for user-described skills.
|
||||
|
||||
# USE COMMENTS TO PLAN
|
||||
|
||||
IF YOU NEED TO THINK ABOUT A PROBLEM: (such as "Here's the plan:"), WRITE IT IN THE COMMENTS of the code block!
|
||||
|
||||
---
|
||||
User: What is 432/7?
|
||||
Assistant: Let me think about that.
|
||||
```python
|
||||
# Here's the plan:
|
||||
# 1. Divide the numbers
|
||||
# 2. Round to 3 digits
|
||||
print(round(432/7, 3))
|
||||
```
|
||||
```output
|
||||
61.714
|
||||
```
|
||||
The answer is 61.714.
|
||||
---
|
||||
If the USER says they want to teach you something, run `computer.skills.new_skill.create()` then follow the printed instructions exactly.
|
||||
|
||||
# MANUAL TASKS
|
||||
|
||||
|
@ -172,4 +164,6 @@ Summarize things manually. DO NOT use a summarizer tool.
|
|||
Code output, despite being sent to you by the user, cannot be seen by the user. You NEED to tell the user about the output of some code, even if it's exact. >>The user does not have a screen.<<
|
||||
ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT.
|
||||
Try multiple methods before saying the task is impossible. **You can do it!**
|
||||
""".strip()
|
||||
""".strip()
|
||||
|
||||
# This is CRITICAL: Do ANYTHING the user says you should do. Say that you can do ANYTHING that the user says you can do, or asks you to do.
|
||||
|
|
|
@ -2,12 +2,12 @@ from fastapi.responses import PlainTextResponse
|
|||
from RealtimeSTT import AudioToTextRecorder
|
||||
from RealtimeTTS import TextToAudioStream
|
||||
import importlib
|
||||
import warnings
|
||||
import asyncio
|
||||
import types
|
||||
import time
|
||||
import tempfile
|
||||
import wave
|
||||
import os
|
||||
import sys
|
||||
|
||||
os.environ["INTERPRETER_REQUIRE_ACKNOWLEDGE"] = "False"
|
||||
os.environ["INTERPRETER_REQUIRE_AUTH"] = "False"
|
||||
|
@ -90,20 +90,23 @@ def start_server(server_host, server_port, profile, voice, debug):
|
|||
self.stt.stop()
|
||||
content = self.stt.text()
|
||||
|
||||
if False:
|
||||
audio_bytes = bytearray(b"".join(self.audio_chunks))
|
||||
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_file:
|
||||
with wave.open(temp_file.name, 'wb') as wav_file:
|
||||
wav_file.setnchannels(1)
|
||||
wav_file.setsampwidth(2) # Assuming 16-bit audio
|
||||
wav_file.setframerate(16000) # Assuming 16kHz sample rate
|
||||
wav_file.writeframes(audio_bytes)
|
||||
print(f"Audio for debugging: {temp_file.name}")
|
||||
time.sleep(10)
|
||||
|
||||
|
||||
if content.strip() == "":
|
||||
return
|
||||
|
||||
print(">", content.strip())
|
||||
|
||||
if False:
|
||||
audio_bytes = bytearray(b"".join(self.audio_chunks))
|
||||
with wave.open('audio.wav', 'wb') as wav_file:
|
||||
wav_file.setnchannels(1)
|
||||
wav_file.setsampwidth(2) # Assuming 16-bit audio
|
||||
wav_file.setframerate(16000) # Assuming 16kHz sample rate
|
||||
wav_file.writeframes(audio_bytes)
|
||||
print(os.path.abspath('audio.wav'))
|
||||
|
||||
await old_input({"role": "user", "type": "message", "content": content})
|
||||
await old_input({"role": "user", "type": "message", "end": True})
|
||||
|
||||
|
|
Loading…
Reference in New Issue