Tinderbox Meetup Saturday 23 Mar. 2024 Video: Scatterplot Map View with Allen Mikaelian

Tinderbox Weekly Meetup Saturday 23 Mar. 2024: Scatterplot Map View with Allen Mikaelian

Level Intermediate
Published Date 03/23/24
Revision 1
Type Meetup
Tags 5CKMEl, 5Cs of Knowledge Management and Exchange, Tinderbox, aTbRef
Video Length 00:00
Video URL TBD
Example File TBD
TBX Version 9.7
Host Michael Becker

SPOILER ALERT: The video of this session did not get recorded, and the chat was not saved. Something went wrong. @satikusala will try to work with Allen to do a second recording (Allen’s efforts are just that good! We don’t want to miss them).

In this Tinderbox Meeting, Allen Mikaelian (@AJayM]), a writer, historian, researcher, and self-taught programmer, shared his method to create scatterplots in Tinderbox Mapview. These scatterplots are generated using action code and locally host large language models that he has trained on his own data, which has integrated with OpenAI to enrich the results that OpenAI ChatGPT 3.5 returns. Also, Mark Anderson (@mwra) shared his effort to integrate fuzzy search into aTbRef. And, Michael Becker (@satikusala) briefly showed how to use a parent designator in a .each loop operator.

Resoruces

Meeting Summary for Tinderbox Weekly Meetup 23 MAR 24

Quick recap

The team discussed various topics including a JSON export issue, preferences for email clients, and the use of Tinderbox for creating scatter plots and analyzing data. They also explored the potential of large language models in work environments and the challenges of selecting suitable models for specific tasks. Mark introduced a new ‘fuzzy search’ feature and emphasized the importance of making information available to others. Personal updates and future plans were also discussed, with Michael suggesting the recording of their session and Mark preparing for the next meeting.

Reviewing Past Event and Personal Tools

Michael, David, Anthony, and Mark discussed their previous event, with Michael questioning its value and Anthony being recommended to watch its recording. They then moved on to discuss their future steps, with Michael suggesting they record their session. A discussion on personal knowledge management tools followed, led by Michael, and Mark introduced a puzzlement, which he and Allen agreed to work on later. Allen presented his method of using Tinderbox to create scatter plots for his work, emphasizing the importance of the community and developer relationships. The team agreed to interrupt presentations for clarification or further discussion.

Scatter Plot Interpretation and Tinderbox Usage

Allen and Michael discussed the interpretation of data values displayed in a scatter plot, with Allen explaining that the X and Y values were used to display distances between notes related to a large language model. Allen further demonstrated how to use Tinderbox to plot per capita GDP against life expectancy, coloring each data point by continent, and how to create adornments for the X and Y axes. Michael, unfamiliar with Gapminder, was provided with the organization’s link for further information. Michael showed appreciation for Allen’s explanation and the resulting visualization.

Exploring AI-Enhanced Chatbots and Tools

Allen demonstrated the utility of various software tools in creating scatter plots, emphasizing the importance of these tools in exploring and analyzing data. He also discussed the concept of a retrieval-augmented generation system, using his own notes for AI-enhanced chatbots, and addressed Michael’s queries about specific AI models. The conversation ended with Michael suggesting Custom GPT, a tool for creating chatbots based on a corpus of documents, and they discussed the inconsistencies Allen experienced with Chat GPT.

Allen’s Exploration of Local Language Model Analysis

Allen shared his exploration of using a local large language model to analyze his notes, demonstrating how the model converts text into embeddings and predicts the next piece of text. He also presented his Python code, which flattened the model’s 800-dimensional embeddings to two dimensions for visualization. He explained that the distance on the map determined the dissimilarity between notes, and shared his observations about the colors indicating the source of the notes. Mark Anderson clarified a point about the use of distances. Allen further demonstrated the functionality of a query system in Tinderbox, detailing how it can search for specific notes or summarize a container full of notes. He also mentioned the system’s ability to identify notes related to a particular theme or thesis and shared his experience with exporting all the notes into a database using a JSON lines format.

Model Variability, Python Script, and Metadata

Allen and Michael discussed the role of temperature as a variable in their model, with Allen explaining its impact on the model’s randomness. Allen further elaborated on the Python script, the software involved, and how the model retrieves relevant notes using Chroma. Towards the end, they delved into the functionality of a language model that retrieves metadata and notes, with Allen expressing his preference for the model to compare note content rather than metadata. Allen also highlighted the importance of including references in the model’s output and the ability to specify the note the reference came from.

Chatbot Response Generation and Metadata

Allen explained the process of using OpenAI and a local chatbot, such as ChatGPT 3 Turbo, for generating responses. He clarified that any chatbot with an API can be used, and the response can be customized by providing it with specific instructions. Allen also addressed Michael’s concerns about intellectual property and the potential for data misuse. Furthermore, he explained the use of metadata, emphasizing that it is not used in the creation of embeddings or during similarity searches. Finally, he demonstrated how to find a model on Hugging Face.

Allen’s Machine Learning Challenges and Tools

Allen discussed his ongoing experimentation with various machine learning models for specific tasks and the challenges faced in selecting a suitable model. He introduced a new tool that ingests his documents to provide more nuanced responses and compared its performance to a regular chatbot. Allen also highlighted the challenges with the retrieval process in his vector database and the balance between speed and accuracy. Furthermore, he presented a scatterplot tool integrated within Tinderbox, which he used to explore relationships within a dataset and envisioned its potential as an AI assistant for the project. David asked for clarification about the dataset, while Michael expressed surprise and confusion about the tool’s potential applications.

Exploring Tinderbox, Language Models, and Workflow

Allen, a writer and historian, discussed his use of Tinderbox for his workflow and highlighted the potential impact of large language models in work environments. He also addressed concerns about confidentiality when using commercial systems, suggesting that local models could be a more appropriate solution. Additionally, Allen encouraged the exploration of resources like OpenAI’s assistance feature and document loaders. Meanwhile, Michael shared a solution he created for grading student work using a rubric and a stamp system. The conversation ended with Mark Anderson preparing to present on fuzzy search.

Fuzzy Search Feature and RSS Feeds

Mark Anderson introduced a new ‘fuzzy search’ feature, inspired by Jack’s blog, which uses Fuse Js library for finding items that are similar to the search query. He also discussed the functionality of RSS feeds, templates, and the process of maintaining and optimizing links. Furthermore, he highlighted the importance of making information available to others to avoid duplication of effort. Dave highlighted the value of the fuzzy search in Tinderbox.

What were your top 2~3 key takeaways from this session?

Please comment

Waht do you want to learn next? Learn more about?

What exercises would help reinforce your learning?

1 Like

A re-recording sounds promising @satikusala

Please make sure to let us know once this will have happened.

Thank you