• The Bright Journey with AI
  • Posts
  • Map Your Data Sources To Improve Data Quality | Apple's Data Scraping Efforts | OpenAI Release Update to Assistants API | Meta Release a New Type of Diffusion Model

Map Your Data Sources To Improve Data Quality | Apple's Data Scraping Efforts | OpenAI Release Update to Assistants API | Meta Release a New Type of Diffusion Model

The Bright Journey with AI

šŸ“†September 3rd 2024šŸ“† 

Today we start into a new series of pieces, articles and general ranting about data & AI. Starting from the very beginning, are you able to confidently say where all your data is coming from?

In the news

  • Apple are getting backlash over their new web scraping practices with many opting out of their content being used

  • You now have more control over your OpenAI Assistants with the most recent update

  • Meta release a new type of model called a ā€œtransfusionā€ model allowing generation of text and images in a single model architecture

Dive in for all the details!

šŸ¤– Unlock AI šŸ¤– 

Youā€™re Not Ready

Not what you expected? I am truly sorry but worry not, weā€™ll get there.

The last 2 years have been an absolute rollercoaster, ever since Open AI pushed that button the world has been stirred to dream of AI and what it can do for us. As we start to settle from the hype & initial excitement we must find the path forward to realise true value from this technology. This means creating new products & services which both delight and empower customers to do more.

So what do I mean by ā€œyouā€™re not readyā€? As with any software product or project, there are multiple components which lead to itā€™s success or failure. For many, they are only aware of the mediaā€™s interpretation of AI with itā€™s catchy headlines and seemingly boundless opportunities. The reality is much more boring Iā€™m afraid but will deliver results. Lets start with something simple, getting a lay of the land.

Know Your Source Data 

In order to be successful in data & AI projects you must first understand the building blocks you are working with. Data quality, stability, observability, alerting and all the other aspects of a managed data product depend on having clear sight of where the data comes from.

While this starts as a simple list, spending time analysing and documenting all your source data allows you to ask some questions.

Here are some that I will regularly start with:

  1. Is the data accessible?

  2. Is it already staged in the warehouse?

  3. Is it complete?

  4. How feasible is it to get at X cadence?

  5. Does it have adequate tests validating it?

  6. Do you have observability & alerting applied to it?

  7. How important is this data compared to other sources?

By knowing these answers we can start to:

  1. Plan projects to get this data into a uniform and managed process

  2. Start to measure things like data quality & data health on an ongoing basis

  3. Measure progress in a difficult to measure domain

  4. Prioritise your most important data flows

One thing to keep in mind is that this is an ongoing effort, you will not capture every source data point because new ones will come online OR youā€™ll just miss some lower level ones. The purpose of this exercise is to have a means to identify and organise source data so it can be use repeatedly in other projects without having to rework each time.

ā

You canā€™t improve what you canā€™t measure

This statement is that the very heart of data management. Quite often the data space spans multiple teams, systems or organisational boundaries. It is in these cases that the source of truth can escape us, leading to confusion, delay and failure of more expensive initiatives. So do yourself a favour, care about source data, carve out time for your team to start building that list & managing your data vs letting it fall into reportsā€¦.. it is the fuel your car runs on and the higher the quality the more chance you have of winning the race.

šŸ’«Help Support my WorkšŸ’«

I really enjoy researching and writing about AI & appreciate your support. If you enjoy this content please use any of the options below to help

  • Buy Me a Coffee ā˜•ļø- AI is really fuelled by coffee - keep the tank filled

  • Spread the Word šŸ—žļø- Share on social or directly with your friends

  • Follow me on X - Always happy to chat AI and software dev

  • Free 30 Minute Consult - There are many facets to data projects - I can help you chart a course and increase the chance of success.

šŸ“° News šŸ“° 

Apple's AI Scraping Efforts Face Major Resistance from Publishers

Apple's introduction of Applebot-Extended, a tool allowing websites to opt out of having their data used for AI training, has been met with significant pushback from major publishers. Websites including The New York Times, The Atlantic, and Vox Media have blocked Applebot-Extended, reflecting broader concerns about intellectual property and the monetization of content. This development is part of a larger trend where publishers are increasingly cautious about how their data is utilized by AI companies, with many preferring explicit agreements or compensation for their content.

Applebot-Extended's emergence highlights the tension between tech companies and content creators. While Apple claims this tool respects publisher rights by providing an opt-out mechanism, critics argue that data use should be based on prior consent, not after-the-fact exclusion. This resistance underscores the growing complexity of AI ethics and the importance of navigating data rights in the digital age.

Read More At: Wired

OpenAI Empowers Developers with Enhanced Control Over AI Assistants

OpenAI has rolled out new updates to its Assistants API, granting developers greater control over how AI assistants manage and retrieve data. This enhancement, particularly in the File Search tool, allows developers to fine-tune how information is selected and used by the AI, improving the relevance and accuracy of responses. By adjusting the ranking mechanisms of search results, developers can now ensure that their AI models generate more contextually appropriate outputs, addressing a significant demand within the developer community.

This move by OpenAI is part of a broader push towards creating more autonomous AI agents capable of performing complex tasks with minimal human intervention. The updates have been met positively, with developers expressing relief over the newfound ability to customize their AI assistants more effectively. This is seen as a crucial step towards advancing AIā€™s utility in enterprise applications, where precision and control are paramount.

These developments align with industry trends, where companies like Google and Salesforce are also enhancing their AI platforms to offer more customizable and autonomous solutions for businessesā€‹.

Read More At: VentureBeat

Meta Unveils 'Transfusion': A Unified Model for Text and Image Processing

Meta has introduced "Transfusion," a ground-breaking AI model that seamlessly integrates language processing and image generation within a single Transformer architecture. Unlike traditional models that handle text and images separately, Transfusion processes images as sequences of patches, allowing them to be combined directly with text tokens. This innovative approach not only preserves the continuous representation of images but also enhances the model's ability to generate high-quality images and process text more effectively.

The Transfusion model, which features 7 billion parameters, has demonstrated impressive results, matching the image generation capabilities of specialized systems like DALL-E 2 while also improving text processing. Trained on a massive dataset of 2 trillion text and image tokens, Transfusion exemplifies the potential of unified multimodal AI models to handle complex tasks more efficiently and at a lower computational cost.

This development is a significant step toward more versatile AI systems, capable of managing a variety of data types and tasks within a single framework. Researchers are optimistic about the potential for further advancements, such as integrating additional modalities and refining training methods

šŸ”Ø AI Powered Tools šŸ”Ø 

  1. Quick Magic AI is an advanced Artificial Intelligence platform designed to provide efficient and accurate animations for businesses and individuals. QuickMagic focuses on efficiently converting 2D information about human joints in videos into 3D model motion data. Compared with traditional animation production, QuickMagic simplifies 3D motion production, efficiently improving animation production.

  2. AutoResponder.ai is an AI-driven tool that automates replies for various messaging platforms, including WhatsApp, Facebook Messenger, Instagram, and more. It allows businesses and individuals to set custom rules for automatic responses, significantly reducing the manual effort required to manage communication. The tool integrates with AI services like ChatGPT or Dialogflow for enhanced interactions and can trigger custom actions, making it highly versatile for managing high-volume messaging.

  3. Notion Sites is a tool for quickly creating and publishing websites without coding. It offers over 10,000 templates and a drag-and-drop interface, allowing users to customize domains, themes, and content with ease. Integrated AI assists in content creation, while SEO tools help optimize site visibility. It's ideal for building personal sites, portfolios, and more, with options for custom branding and Google Analytics integration.

Reply

or to participate.