Summary of LLM Tools Usage Experience

ChatGPT o4-mini

Thu, Apr 17 The latest model released in the past two days, with significant reasoning capabilities and excellent multi-step execution. It appears that o4-mini has completely surpassed o3-mini in practical task execution performance.

The following image shows an example of my usage. The question was: “In Black Mirror Season 7 Episode 3, there are many Asian characters. Are these Asians Chinese, Korean, Japanese, Malaysian, or Singaporean?” The model successfully called the search engine multiple times and obtained the correct answer:

Image.png

Additionally, Projects can now be normally called and used in this mode. Since the model can search autonomously rather than using RAG methods, the efficiency of file searching has significantly improved.

ChatGPT (excluding o3, o4-mini)

  • GPT-4o: Versatile. Excellent for daily communication and learning, with sufficient context window to easily handle code and document retrieval tasks. Supports multiple attachment formats and online code execution. Drawback: generates limited context length, not suitable for very long content. For lengthy conversations, context compression is severe and may forget earlier content.

  • Projects: Somewhat redundant. Suitable for scenarios requiring frequent searching and text retrieval across multiple files.

  • GPT-4o mini: Weaker than GPT-4o, but has a massive context window, recommended for translating very long texts.

  • o1: A decent chain-of-thought model, suitable for solving complex code and mathematical problems. Not good at handling emotional or intuitive problems. However, the thought chain is too short and quality is not high. It’s recommended to use the output as input for Gemini-2.5pro or DeepSeek-R1 to optimize quality.

  • o3-mini: Severe hallucinations, inferior to o1, but extremely fast reasoning speed.

  • Search: Slightly redundant. New version’s search results are heavily restricted, even inferior to model output without search. Suitable as a light search alternative.

  • Deep Research: Excellent tool. Best quality among similar features, most comprehensive output results.

  • GPT-4.5-preview: Excellent. Massive context window, strong long-text comprehension. Has vast memory and strongest intuition. Not actually good at reasoning, but performs best due to low hallucinations.

  • Canvas: Overall inferior to Cursor. Context output too small, text length limited. Suitable for short-text scenarios with real-time editing.

  • Work with Apps on macOS: Functionality experience crushed by Cursor, essentially a simplified version of Cursor.

Claude

  • Claude 3.7 Sonnet: Excellent, suitable for generating various code. Web version supports massive context, almost matching API’s context history message retention. Model very suitable for writing communication texts like emails, with accurate, concise, and unpretentious wording. Drawback: limited usage in free version.

DeepSeek

  • R1: Severe hallucinations, unstable performance, occasionally affecting usability, requires very high quality prompts. When prompts are correct or context is complete, it outperforms o1. Can be used as input for o1’s output to optimize quality.

  • V3: Very excellent, alternative to GPT-4o.

Grok

  • Grok-3: Generates text fluently and naturally, less AI-like, suitable for natural writing and novel creation. Overall mediocre, fewer productivity tools than GPT-4o.

  • Grok-3 + search: Excellent. Leverages English social media data, quickly analyzes news and current events. Outputs long content, barely filters search source content, stronger search capabilities than GPT-4o.

  • Grok-Deep Search: Slightly redundant, actually inferior to Grok-3 + search. Generated content heavily templated, affecting quality.

Gemini

  • 2.5 Pro: Excellent model, alternative to GPT-4o, minimal hallucinations, high-quality search results, complete and clear thought chains, strong logical reasoning. Supports integration with Google tools, very useful in specific scenarios, such as uploading screenshots or text to automatically create events in Google Calendar. Drawback: often claims to have performed searches when it actually hasn’t.

  • Deep Research: Average, slightly better than old GPT Search. Occasional comprehension deviations, search and document generation executed in stages, process fragmented, heavily templated. Currently the only product that can replace GPT Deep Research.

v0.dev

  • Suitable for writing and previewing frontend components online.

chat.qwen.ai

  • Qwen2.5-Max: Very excellent model, alternative to GPT-4o, fast generation speed, supports thinking mode.

Zhihu Direct Answer

  • Supports searching Zhihu’s entire network content. Possible competitor is Xiaohongshu Direct Answer.

Perplexity

  • Supports basic search, mediocre quality, slightly redundant.

Mistral.ai

  • Fast speed, large context window, alternative to GPT4o-mini.

Cursor

  • Excellent tool. Can index entire code repositories, suitable for large project development. Supports multiple model switching, local command line execution and terminal takeover, suitable for the following tasks:
    • Writing LaTeX documents, replacing Overleaf
    • Assisting with various assignments
    • Code repository search and specific feature location
    • Creating unit tests
    • Executing git operations
    • Executing deployment tasks
  • Can execute almost everything involving documents and command lines, helpful in development, debugging, debugging, and document writing
  • Truly the most Agent-like product at the current stage (though this claim no longer holds after the release of o3 and o4-mini, as they can also perform multi-step planning and task execution. However, o3 and o4-mini can only operate within their own limited environment, while Cursor can execute on the user’s computer, which is an advantage.)

Example: Leetcode Tracker

There are many shared Leetcode premium accounts on Taobao, but sometimes we still need to add our problem-solving records to our own accounts. After purchasing a shared premium account, we can export a company’s problems as a CSV file, then export our own problem-solving records as another CSV file. By comparing these two files, we can easily track our current progress. Cursor is very good at writing such small tools:

Image.png

We just need to copy elements from the webpage, and Cursor can write complete CSV files for us and automatically generate a perfect frontend interface, greatly improving our problem-solving efficiency.

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy